FROM THE EDITOR Challenges in Statistics Production for Domains and Small Areas (II) Other Articles

Size: px
Start display at page:

Download "FROM THE EDITOR Challenges in Statistics Production for Domains and Small Areas (II) Other Articles"

Transcription

1 FROM THE EDITOR This is a secon issue evote to selecte papers presente at the International Conference (SAE005) on Challenges in Statistics Prouction for Domains an Small Areas. The Conference was hel at the University of Jyväskylä, Finlan, in August 005. The first fifteen papers were publishe in the December 005 issue of the journal (vol. 7 Number 3, December 005). The first part of this issue contains eight papers uner the general title Challenges in Statistics Prouction for Domains an Small Areas (II) presente at the SAE005 Conference with introuction from the Guest Eitor. Prof. Risto Lehtonen, Helsinki University, Finlan, agree to serve as a Guest Eitor for these two special issues. I woul like to express my highest gratitue an thanks for his efficient work. The papers have been reviewe by the following referees: Timo Alanko, Ray Chambers, Kari Djerf, Stefano Falorsi, Dan Helin, Montserrat Herraor, Seppo Laaksonen, Nick Longfor, Domingo Morales, Mikko Myrskylä, Kari Nissinen, Kaja Sõstra, Imbi Traat, Ari Veijanen, Li-Chun Zhang an the Guest Eitor. I woul like to thank all of them very much for their effort in improving quality of the papers. The secon part of the issue entitle Other Articles contains the following four papers:. Exact Distribution of the Natural ARPR estimator in small samples from infinite populations (by Ryszar Zieliński from Polan). The paper is connecte with the European Commission Eurostat ocument Doc. IPSE/65/04/EN page, where the "at-risk-of-poverty rate" ARPR is efine as the fraction of persons in a given population with the equivalise isposable income smaller than p percent p = 60 of the q th population quantile ( q = the population meian). In the paper the author presents the exact istribution of the estimator as well as the exact formulas for its expectation an its variance. Numerical examples illustrate the results for some population istributions. Multivariate Sample Allocation: Application of Ranom Search Metho (by Marcin Kazak from Polan). The author iscusses sample allocation between strata or omains. A convergence of the algorithm is presente using simulation stuies for two artificial populations an real agricultural ata. The application of the metho is presente using ata from a survey of micro enterprises conucte by the Central Statistical Office of Polan.

2 756 From the Eitor 3. Remarks on Using the Polish LFS Data an SAE Methos for Unemployment Estimation by County (by Jan Kubacki from Polan). The author presents a synthetic overview of recent efforts relate to the small area estimation methos applie to the Polish Labour Force Survey (PLFS). In the paper the author iscusses various methos of estimation together with evaluation of quality of such estimation relate in particular with type of auxiliary ata use for borrowing strength an efficiency of initial estimates use in moels. 4. Comparisons of Three Prouct-Type of Estimators in Small Sample (by Arun K. Singh, Lakshmi N. Upahyaya an Housila P. Singh from Inia). The authors propose a class of unbiase prouct-type estimators for estimating the population mean using auxiliary variable in single phase sampling. Expressions for the bias an mean square error of the propose class of estimators was erive in small sample assuming a linear moel, an its exact efficiency compare with the usual unbiase estimator. Minimum mean square error of the propose class of estimators was erive. To simplify the iscussion, the authors confine themselves to simple ranom sampling an assume the population size to be infinite. The eitor announces with the eepest sorrow that Prof. Mikołaj Latuch, Presient of the Polish Statistical Association (98 985), Professor Emeritus of Warsaw School of Economics, ie in October 005. The Polish Statistical Association foune in 9, ha ifferent perios of its activity. Activity of the Association was several times interrupte for Worl Wars an political reasons, an reassume its permanent activities from 98. Prof. Latuch was the first Presient of the Association after its reactivation in 98, an significantly contribute to its evelopment. Jan Koros The Eitor

3 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp FROM THE GUEST EDITOR This Special Issue of Statistics in Transition continues the publication of papers coming from the SAE005 Conference Challenges in Statistics Prouction for Domains an Small Areas which was hel in August 005 in Jyväskylä, Finlan. The first set of papers was publishe in the December 005 issue of the journal. We start with contributions given in the panel iscussion session, Future Challenges of Small Area Estimation. The aim of the session, as summarise by the chairman, was to attempt to ientify the future challenges in implementing SAE methos in European National Statistics Institutes (NSIs), to ientify future theoretical challenges in small area estimation (SAE) an to ientify future challenges in the training of official statisticians in the theory an application of SAE methos. The contributors Jan van en Brakel, Dan Helin, Li-Chun Zhang, Ray Chambers an Risto Lehtonen aress these points both from a research point of view an a more practical point of view. In their paper, Michele D Alò, Loreana Di Consiglio, Stefano Falorsi an Fabrizio Solari incorporate spatial correlation structures in a small area estimation proceure of the Italian poverty rate an obtain improve estimation when compare with more stanar methos. Grazyna Dehnel an Elzbieta Golata present results on the use of aministrative ata sources an inirect estimation techniques for the estimation basic economic statistics for small businesses in certain population subgroups in Polan. Piero Demetrio Falorsi, Danilo Orsini an Paolo Righi iscuss balance sampling an coorinate sampling esigns for small area estimation in an important case of a large number of omains of interest. Wojciech Gamrot proposes the use of two-phase or ouble sampling for the estimation of omain totals uner unit nonresponse. Jan Paraysz an Tomasz Klimanek apply the methos evelope in the context of the EURAREA project for the estimation of some important business statistics in Polan. Krystyna Pruska stuies in her paper some aspects of the application of logistic regression moels for small area estimation problems. In the final paper Jan Koros ientifies several ifferent factors that have ha a significant impact to research activities on small area estimation in Polan. I woul like to count in the international conferences organise by Prof. Koros, in 99 in Warsaw an in 999 in Riga. These conferences surely have also more broaly stimulate methoological evelopment an application of new methos in estimation for omains an small areas. I woul like to inform that a further conference, the SAE007 Conference on Small Area Estimation, is planne to take place in Pisa, Italy, 3 5 September 007. For more information of this event

4 758 From the Guest Eitor reaers are avise to contact Monica Pratesi or Nicola Salvati of the University of Pisa. Several persons (in aition to the Eitor an Guest Eitor) have serve as reviewers of the SAE005 Conference papers publishe in this an the previous Special Issue of the journal: Timo Alanko, Ray Chambers, Kari Djerf, Stefano Falorsi, Dan Helin, Montserrat Herraor, Seppo Laaksonen, Nick Longfor, Domingo Morales, Mikko Myrskylä, Kari Nissinen, Kaja Sõstra, Imbi Traat, Ari Veijanen, an Li-Chun Zhang. I want to express my sincere thanks to all these people for an excellent collaboration. Last but not the least, I once again aress thanks to Jan Koros, the Eitor of Statistics in Transition, for publishing SAE005 Conference papers in the journal. Risto Lehtonen The Guest Eitor 758

5 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp FUTURE CHALLENGES OF SMALL AREA ESTIMATION Ray Chambers, Jan van en Brakel, Dan Helin 3, Risto Lehtonen 4 an Li-Chun Zhang 5 ABSTRACT A panel iscussion session entitle as Future Challenges of Small Area Estimation was organize in the SAE005 Conference, with Ray Chambers as the organizer an chair an Jan van en Brakel, Dan Helin, Marie Cruas, Risto Lehtonen, Imbi Traat an Li-Chun Zhang as the iscussants. The output of the panel iscussion session is summarize in this paper. Each contributor is responsible of the respective piece of text; the paper has been eite by the chairman of the session. Key wors: Implementation of small area estimation methos, NSI staff training, Theoretical evelopment in small area estimation. The Eurarea project (Heay an Ralphs, 005) sponsore the very successful SAE005 conference hel at the University of Jyväskylä from 7 to 3 August 005. The last session at this conference was a panel session that aresse future challenges of small area estimation, especially in a European context. The aim of the session was threefol to attempt to ientify the future challenges in implementing SAE methos in European National Statistics Institutes (NSIs), to ientify future theoretical challenges in small area estimation (SAE) an to ientify future challenges in the training of official statisticians in the theory an application of SAE methos. The panellists were almost equally ivie between practitioners an theoreticians an comprise (in alphabetic orer) Jan van en Brakel of Statistics Netherlans, Ray Chambers of the University of Southampton, Marie Cruas of the Office for National Statistics (ONS), Dan [email protected] [email protected] 3 [email protected] 4 [email protected] 5 [email protected]

6 760 R.Chambers, J.Brakel, D.Helin, R.Lehtonen, L-Ch.Zhang: Future Challenges Helin of Statistics Sween, Risto Lehtonen of the University of Jyväskylä, Imbi Traat of the University of Tartu an Li-Chun Zhang of Statistics Norway. After the session, the panellists were aske to provie transcripts of their iscussion pieces to be publishe together as a recor of the session an also to highlight the important irections for future evelopment of small area estimations methos in the European context. Five iscussants provie material for publication, an these are set out in this article. Three of these (Van en Brakel, Helin an Zhang) are employe by European NSIs an can be consiere as representing the interests of practitioners as far as future SAE evelopment is concerne, while two (Chambers an Lehtonen) are acaemics working on SAE problems an hence take a more theoretical perspective. Jan van en Brakel Currently, the application of SAE at Statistics Netherlans is limite. This appears to be the situation in most European NSIs, with the exception of the ONS. Several factors are responsible for the slow aoption of these methos. One is that the methoology is intellectually an practically inaccessible (Heay an Ralphs, 005). Inee the statistical theory is rather complex an the extant software at NSIs is often not suitable to conuct the require calculations in a straightforwar manner, which hampers the implementation in survey processes. Another is the fact that small area estimation proceures are preominantly moel base. Many NSIs are rather reserve in the application of moel-base estimation proceures an generally rely on the more traitional esign-base or moel-assiste proceures for proucing their official statistics. The approximate esign unbiaseness of the generalize regression estimator is an attractive an important property for NSIs, since their primary objective is to publish figures about one finite target population. If the unerlying linear moel of the generalize regression estimator explains the variation of the target parameter in the finite population reasonably well, then this might result in a reuction of the esign variance of the Horvitz-Thompson estimator. If the moel is misspecifie, then this results in an increase of the esign variance but the property that the generalize regression estimator is approximately esign unbiase remains. From this point of view, the generalize regression estimator is robust against moel misspecification. More over, esign-base proceures are almost as efficient as moel-base proceures if sample sizes are large. Besies accuracy, timeliness is also an important quality aspect for official statistics. Moel-base small area estimators require careful moel selection an evaluation, since moel misspecification easily results in severely biase estimates. Furthermore surveys contain a large number of variables, an generally ifferent moels are require for each. Together with the lack of experience with these moel-base estimation proceures, it is felt that the implementation of small area estimation proceures might result in an unacceptable elay of the

7 STATISTICS IN TRANSITION, March survey process. This in contrast with the generalize regression estimator, which is often use to prouce one set of weights for the estimation of all the target parameters of a sample survey. Generalize regression estimators, however, have relatively large esign variances in the case of small sample sizes. Small sample sizes arise if estimates are require for very etaile geographic or social emographic classifications. Small sample sizes also arise if timely estimates are require, resulting in a lack of time to collect a sufficient amount of ata. As a consequence, a project has been starte to implement small area estimation in Statistics Netherlans survey processes. Several applications have been ientifie. For example in the Dutch Labour Force Survey there is a strong eman for monthly estimates of numbers of employe an unemploye. The monthly sample size of this survey, however, is too small to prouce sufficiently reliable monthly estimates with the generalize regression estimator. There is also a strong eman for quarterly an annual estimates at a very etaile regional level (municipalities an even smaller). Another important application for small area estimation is the prouction of timely short-term economic inicators. Since there is a strong eman for these timely inicators, many NSIs work with provisional releases that are base on the ata obtaine in the first part of the ata collection perio. From this point of view small area estimation can improve the timeliness of a survey process. Note that the objective of these surveys is the estimation of one parameter only, namely turnover. Consequently the elay associate with moel evaluation is manageable. In this application there is the aitional problem that estimates obtaine from the early responents cannot be consiere as a probability sample. Most surveys conucte by NSIs operate continuously in time an are base on cross-sectional or rotating panel esigns. Consequently, SAE proceures that borrow strength from ata collecte in the past as well as cross sectional ata from other small areas are particularly interesting. For example the estimation proceure base on structural time series moels for repeate surveys, propose by Pfeffermann (99) an Pfeffermann an Burck (990) has high practical value. This approach borrows strength in time an space an can be mae robust against moel misspecification by benchmarking the sum of the small area estimates to the irect estimates at an aggregate level. Since this property provies a built-in mechanism against moel misspecification, it partially meets the objection that SAE elays the survey process. Furthermore it is possible to specify moels that explicitly account for the rotating panel esign of the survey, resulting in more efficient estimates for the population parameters. It can also be extene to account for rotation group bias, i.e. systematic ifferences between the parameter estimates of the successive panel waves, an yiels estimates for the tren an seasonal components of the population parameter. As a result, seasonally ajuste parameter estimates an their estimation errors are obtaine as a byprouct. Finally the moel can be use to forecast population parameters

8 76 R.Chambers, J.Brakel, D.Helin, R.Lehtonen, L-Ch.Zhang: Future Challenges beyon the sample perio for which irect survey estimates are available. As a consequence, this proceure fits into a framework for proucing timely short-term statistics. At the start of a ata-collection perio, the moel yiels forecasts for the population parameters for time perios for which no survey ata are available (this is sometimes calle nowcasting). When new survey ata become available, timely preliminary an final estimates can be mae which are base on information from ata collecte in the past an from ata collecte in neighboring areas. This results in a smooth conversion from preicte values, provisional releases to final releases. It can be conclue that there are clear applications for SAE in European NSIs, an it is very likely that they will eventually aopt this methoology. It is a task of R&D epartments to recognize in which situations SAE provies aitional value compare with stanar irect estimation proceures. Over time it can be expecte that survey practitioners buil up sufficient experience with this relatively new methoology an implement it in tools that can be use smoothly in the survey process. Dan Helin There are some current trens among NSIs that make SAE more attractive an realistic. These are the pressure to reuce costs while simultaneously easing the response buren for businesses an organisations, the increasing eman for small area statistics an a growing commitment to methoology, in particular applie survey methoology. Also, there are common factors at NSIs that are particularly germane to SAE. In particular, NSIs prouce statistics on a largescale basis, largely aopt a esign-base view on estimation an nee to play safe. In particular, gross errors in publishe statistics are to be avoie as far as possible. The nee for NSIs to procee cautiously shoul not be sniffe at. All NSIs rely on their reputation for accuracy an trustworthiness. The point estimates nee be right. This has cause a certain apprehension about SAE. The esigne-base view wiely aopte by most NSI practitioners make them approach SAE gingerly. Again, the picture is rather more complex simple forms of SAE have been in use for a long time at NSIs. So while research is an shoul be in the forefront, implemente practices at NSIs are an shoul be lagging a bit behin but they shoul not be too far behin! I woul like to echo Carl-Erik Särnal s comment in his speech at the conference that there is too little talk about bias. However, there is a ifference between what I woul tentatively call ranom bias an systematic bias, although ranom bias souns like an oxymoron an systematic bias like a tautology. With ranom esign-bias I refer to the kin of bias that we woul moel with a symmetric or nearly symmetric istribution aroun zero. This is essentially another sie of the variance coin. To give a flavour of the iea of

9 STATISTICS IN TRANSITION, March ranom bias, think of a simulation stuy of estimates for 500 omains (not an unusual number of omains in official statistics) where the 500 empirical biases are foun to be fairly symmetrically istribute with zero average. Systematic bias is systematically positive or negative (as in winzorisation). In official statistics the point estimates are just as important as the interval estimates since many users will use the former in their analyses. Having systematic bias in official statistics is in my view quite ba. Think of a consumer price inex series that is always too low ue to systematic bias. Then people who have their wages tie to the consumer price inex will always be pai too little. Imagine the reaction if that became public knowlege. Since NSIs prouce statistics on a large-scale basis an nee to o so with reasonable reliability, SAE (similarly to other moel-base estimation methos) nees to be robust. It is not feasible for an NSI to check all moels in great etail. I believe the way to go is more extensive research into robustness an iagnostics for SAE. We nee a set of easily use iagnostics that help proucers of statistics focus on the potential problems. One can compare here with the area of eiting where one stran of research has focuse on methos that highlight large errors an large error probabilities. At the Jyväskylä conference some very interesting finings an research were presente, all very promising, but there is still scope for further progress. Li-Chun Zhang As a methoologist working in an NSI, I woul like to focus my iscussions more narrowly on the "future challenges for small area estimation in the official statistical system". In oing so, I venture to ask some questions about one immeiate an one somewhat more istant future. Despite consierable evelopments in the use of moels for small area estimation in the past two ecaes or so, I am not aware of any official statistics prouce by Statistics Norway that are genuinely base on such moels. Much of the reason for this has to o with two things, namely, usefulness an usability. Clearly, these are relate issues, because a highly useful/esirable output may rener the require technique acceptable even if it is much more complex than what one is accustome to, whereas the usefulness of a usable technique is more reaily 'iscovere' than otherwise. It is perhaps helpful here to raw comparisons to the evelopment of seasonal ajustment in roughly the same perio, where the combination of usefulness an usability seems to have brushe away all theoretical objections to the practice. It seems to me that for the immeiate future we nee to ig more eeply into the usefulness of the SAE techniques that are now at our isposal. What are the motivations for routinely proucing small area (or omain) statistics? Is it possible to get a single set of estimates that satisfy all these purposes? If not, what are the most important uses? Sensible fun allocation? Snap-shot National

10 764 R.Chambers, J.Brakel, D.Helin, R.Lehtonen, L-Ch.Zhang: Future Challenges Accounts? Ientify them an we will make ourselves more interesting to the users if we are able to eliver to these emans. It also seems to me that the usability of SAE techniques will be greatly improve by the prouction of a flexible, stanar software package that can be use to implement them. The analogy with the software packages that are routinely use for seasonal ajustment is obvious. Personally, I woul aopt a smoothing technique perspective here, supplemente with explicit moel-base iagnostics. Some of the recent evelopments in nonparametric approaches to the problem of small area estimation seem interesting in this respect. In the preceing paragraphs I have trie to pick up a few clues for the near future, base on reflections of the immeiate past. I shall now take the exercise one step further: To look further ahea I will first take a look further back. The first half of the twentieth century witnesse a profoun transition from almost exclusively census-base statistical prouction to moern survey sampling. Data collection has become much more ynamic an flexible, an NSIs are consequently much more powerful information proviers. However, samplebase statistical prouction is not without limitations. An the increasing eman for etaile statistics is one of the most pressing issues. It is ifficult to preict the future because we are never fully aware of the potential means nor the nees that await us. Still, a statistical system that is able to combine the etails of a census an the timeliness of a survey, together with its economy of resources an perhaps higher quality of ata, seems attractive. Here it seems to me that the Scaninavian countries have an option in their rich aministrative ata sources. Register-base census-like statistics are alreay a reality for a number of key variables. The question is how to strengthen information by combining ata sources. In spite of its relationship to the use of auxiliary information for survey samples, the concept of combining ata sources that I am aressing here is quite ifferent. For one thing, we will no longer be possible to confine ourselves to the so-calle esign-base framework, because it leas nowhere when it comes to the preiction of a non-sample value, which is the most etaile statistic one can possibly ask for. The more istant future of small area estimation nees to be consiere in relation to this general irection of evelopment. The current approach to small area estimation oes not seem aequate. Inee, the way in which the term small area estimation is use toay seems too restrictive in this respect. For instance, is the concept of an "inirect estimator" that "borrows strength" across the whole ata set a peculiarity of small area estimation? What is preiction if not 'borrowing strength' from the observe 'inirect ata' for the unobserve value? In other wors, one shoul not treat small area estimation as a separate, exotic branch of official statistical prouction. Rather, we want it to become an integrate part of our statistics system, with a common philosophy an methoological founation. The extent to which we succee in this can be juge by how naturally small area statistics are being

11 STATISTICS IN TRANSITION, March prouce in the future an, in return, the way small area statistics are being prouce can be taken as a measure of how powerful our statistics system is. Ray Chambers I woul like to focus my iscussion on two viewpoints about the challenges ahea for SAE. The first is what I call an Acaemic s Perspective, relating as it oes to new evelopments in the methoology of SAE that are either currently being researche or that I expect to become part of the SAE research agena in the near future. Probably the most important of these new evelopments is the extension of SAE methos to heterogeneous populations, where the usual homogeneity an normality assumptions implicit in many of the mixe moels in use toay are inappropriate. An obvious example is a business survey population, reflecting the fact that small area methos for business surveys represent an important new area of application for SAE. Another important evelopment is SAE methos for population istributions, incluing percentiles. For many continuous variables the real user interest is in the istribution of this variable within each of the small areas, not its average value. For example, the lack of an income question in the 00 census in the UK has le to a eman for income istributions for small areas to be estimate from survey ata instea. Concerns about the strong assumptions implicit in most moern SAE methos will inevitably lea to more attention being pai to less moel-epenent alternatives. In this context I note the evelopment of alternatives to the stanar area effect approach to moelling the istribution of variables across small areas. For example, Tzaviis an Chambers (005) escribe the use of quantile regression methos in SAE. These methos offer the promise of resolving one major problem associate with the use of stanar mixe moels in SAE the fact that the area effects in the moel are specific to the geography efine by the small areas of interest. As a consequence, any change in the bounaries of these areas requires the moel to be re-specifie an refitte, with no guarantee that the area effects estimate uner the new geography bear any relationship to those estimate uner the ol one. This is one aspect of what geographers refer to as the moifiable areal unit problem, an brings into sharp relief the issue of how an area effect in SAE shoul be interprete. Another spin-off from use of these alternative approaches to SAE is that nonparametric moels are easily implemente, offering the promise of moel-robust SAE. Much of the research into SAE over the last ecae has focusse on methos of estimation of the mean square error of the estimates. The approximationbase methos now in use work well provie the unerlying moel assumptions are vali. However, there is never any guarantee that this is true in any particular SAE application. Bootstrap-base methos offer an important avenue for calculating more robust estimates of mean square error. At present, these

12 766 R.Chambers, J.Brakel, D.Helin, R.Lehtonen, L-Ch.Zhang: Future Challenges methos rely on assumptions about the istribution of area effects. Development of fully nonparametric bootstrap methos for the mixe moels use in SAE will be an important step forwar. Diagnostics for moel selection in SAE remains a largely unexplore area of tremenous practical importance for NSIs consiering uptake of SAE methos. My experience with eveloping methos for estimating UK unemployment at local authority level inicate clearly the concern that practitioners have about moel selection. At present SAE methos represent the main avenue for application of moel-base methos in official statistics. It woul be a tremenous shame if lack of appropriate iagnostics for robust moel-builing prevente NSIs from taking avantage of the efficiencies inherent in these methos. Linke to this is the question of how official statistical collections shoul be esigne if small area/omain estimates are going to be an important output. Some progress in this area was achieve in the Eurarea project, but much remains to be one. Finally, there is one aspect of SAE research where very little is known, but which may be very important in some practical applications. This is in inference for informative omains. By an informative omain here I mean one efine by iviing a population in a way that epens irectly on the variable of interest. An example is where omains are efine by the percentiles of the population istribution of a variable equal to or closely relate to the variable of interest. Stanar mixe moels are inappropriate for this type of situation. Another, quite common situation, is where only some of the small areas have sample ata. In this situation we are completely epenent on the assumption that the areas that have been misse are like the areas with ata. Why shoul this be so? If some form of ranom sampling has ictate whether an area is sample or not, then fair enough. However, I have seen examples of SAE in the US where the areas are States, an where allocation of sample to a State has been base on practical consierations. Here we have to ask ourselves whether inference for the missing States base on SAE moels is even possible. Pfeffermann an Sverchkov (005) aress the issue of SAE where sample inclusion probabilities epen on the values of the variable of interest. It will be interesting to see whether their ieas can be extene to eal with informative omains. I finally briefly turn to what I call a Consultant s Perspective on the real challenge ahea for SAE. This is easily escribe, but much less easily achieve. It is an SAE toolkit that works well in all the situations where an NSI woul require SAE capability. Is such a toolkit possible? The Eurarea project has prouce a toolkit that aims to be of help but is clearly not comprehensive. We nee to buil on this effort.

13 STATISTICS IN TRANSITION, March Risto Lehtonen I iscuss here three inter-relate topics that I feel are important for both proucers of official statistics for omains an small areas in the European context an for acaemics that evelop methos an tools for this purpose. These are implementation of the techniques an tools evelope in the Eurarea project, further research an evelopment nees, an the role of Eurostat. Techniques an computational tools evelope in the Eurarea project are well ocumente an are accessible via the internet ( The toolbox available from this source inclues moel-epenent techniques as the main approach, base on variants of the EBLUP, an also contains computational tools written in the SAS language. Many of the techniques make quite strong assumptions about the ata availability. While strong auxiliary ata are beneficial for omain an small area estimation in general, the most efficient techniques use unit-level moels an assume access to spatial an/or temporal auxiliary information. There are important evelopments unerway in many European countries that improve options to implement these techniques for statistics prouction for omains an small areas. Examples are efforts to buil an improve national statistical infrastructures that allow a combine use of register ata an sample survey ata for statistics prouction, the use of unique ientification keys that allow the construction of combine atabases an unitlevel register panels, an the inclusion of spatial coorinate-base ata into population register ata sources. As is well known, there is a long traition in the Scaninavian countries in this respect, Finlan being a goo example of a country where most regional statistics are register-base (e.g. Tammilehto-Luoe, 005; Statistics Finlan, 00 an 005). Similar features are becoming visible in many other European countries, as was illustrate in a number of the SAE005 conference presentations. An in some countries, national R&D projects have been launche by the NSI aiming at improving the implementation of small area estimation techniques for statistics prouction (e.g. Heay et al., 000). Such projects may exist an probably will be launche in other countries as well. At this moment, the extent of the implementation of (or plans to implement) Eurarea methos an tools for omain an small area statistics prouction in European NSIs is not well known, but examples of at least testing phase projects are available an some of them were escribe in the SAE005 conference. These inclue the regional estimation of ILO employment an unemployment statistics in Finlan, Polan, Spain an the UK, an the estimation of regional economic statistics for small businesses in Polan. Other examples are the estimation of regional income an poverty statistics in Finlan, Italy an Lithuania. It can be expecte that the examination of the potentials of the Eurarea tools an their use

14 768 R.Chambers, J.Brakel, D.Helin, R.Lehtonen, L-Ch.Zhang: Future Challenges for regional social an economic statistics prouction will expan, even in the near future. There thus is a nee to support NSIs in their R&D an implementation proceures. This woul require for example even more etaile ocumentation of the Eurarea methos an tools, an evelopment of proceures for NSI staff training in the use of the methos in practical applications. An to better meet the current an future nees, more research is neee, an the current computational tools shoul be revise an upate using new research results. Some examples are the following. To better facilitate the actual complex sampling esigns that are use by many NSIs, a pseuo EBLUP (Rao, 003) will shortly be implemente in the EBLUPGREG macro evelope in Statistics Finlan (another estimator with a similar aim is the calibrate EBLUP, see Chanra an Chambers, 005). Extension of this to non-linear moels such as logistic mixe moels for binary an polytomous response variables can be expecte. Eventually I expect the esign-base moel-assiste GREG family of SAE techniques to have an extensive coverage, incluing GREG estimators that use linear an logistic mixe moels as assisting moels (e.g. Lehtonen. Särnal an Veijanen, 003 an 005). There are at least two options for making further progress at the European level. The first option is launching a new European project aiming at coorinating omain an small area estimation methos an tools evelopment an helping NSIs in their implementation an staff training. This woul require EU funing. The secon option is builing up an maintaining a network for R&D in omain an small area estimation, connecting experts of NSIs an universities, an perhaps even Eurostat (see Mlaý, 005), supplemente with national an Eurostat R&D activities an projects. This might be more realistic when compare to the first option, but requires funing for coorination. Fortunately, research into omain an small area estimation is still proceeing in many universities an NSIs, as was successfully emonstrate in the SAE005 Conference presentations.

15 STATISTICS IN TRANSITION, March REFERENCES CHANDRA, H. an CHAMBERS, R. (005). Comparing EBLUP an C-EBLUP for small area estimation. Statistics in Transition 7, HEADY, P., CLARKE, P., BROWN, G., D'AMORE, A. an MITCHELL, B. (000). Small area estimates erive from surveys: ONS central research an evelopment programme. Statistics in Transition 4, HEADY, P. an RALPHS, M. (005). EURAREA: an overview of the project an its finings. Statistics in Transition 7, LEHTONEN, R., SÄRNDAL, C.-E. an VEIJANEN, A. (003). The effect of moel choice in estimation for omains, incluing small omains. Survey Methoology 9, LEHTONEN, R., SÄRNDAL, C.-E. an VEIJANEN, A. (005). Does the moel matter? Comparing moel-assiste an moel-epenent estimators of class frequencies for omains. Statistics in Transition 7, MLADÝ, M. (005). Regional labour market statistics at a European level small number of survey responents. Proceeings of the SAE005 Conference. PFEFFERMANN, D. (99). Estimation an seasonal ajustment of population means using ata from repeate surveys. Journal of Business an Economic Statistics 9, PFEFFERMANN, D. an BURCK, L. (990). Robust small area estimation combining time series an cross-sectional ata. Survey Methoology 6, PFEFFERMANN, D. an SVERCHKOV, M. (005). Small area estimation uner informative sampling. Statistics in Transition 7, RAO, J.N.K. (003). Small Area Estimation. Hoboken: Wiley. STATISTICS Finlan (00). Population Census 000. Helsinki: Statistics Finlan, Hanbooks 35c. STATISTICS Finlan (005). Main Lines of Research Policy in Helsinki: Statistics Finlan. TAMMILEHTO-LUODE, M. (005). Register-base statistics an geographic information. Proceeings of the SAE005 Conference. TZAVIDIS, N. an CHAMBERS, R. (005). Bias-ajuste small area estimation with M-quantile moels. Statistics in Transition 7,

16 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp SMALL AREA ESTIMATION OF THE ITALIAN POVERTY RATE Michele D Alò, Loreana Di Consiglio, Stefano Falorsi, Fabrizio Solari ABSTRACT This work aims to compare the performances of some stanar small area methos with an enhance metho using a spatial autocorrelation structure, in the estimation of the Italian poverty rate in unplanne omains. In orer to evaluate the properties of the methos uner analysis, a simulation stuy has been carrie out rawing samples using Monte Carlo techniques. Furthermore, two ifferent sets of auxiliary variables have been evaluate in the stuy. The empirical istributions of the estimates have been use to evaluate biases an mean square errors of the estimators by computing suitable synthetic evaluation criteria. The comparison of the results shows that the spatial moel base estimator performs better than the others estimators. Key wors: Poverty Rate, Small Area Estimation, Linear Mixe Moel, Spatial Autocorrelation.. Introuction This paper illustrates the results of a stuy on the estimation of the relative poverty rate at NUTS3 (province) level. At present, the Italian National Statistical Institute (ISTAT) isseminates poverty rate estimates at national level, for three geographical areas (North, Centre an South) an, starting from 00, also for regions (NUTS). Estimation of the poverty rate is base on ata collecte with the Consumer Expeniture Survey (CES) whose sampling esign allows planne sample size on NUTS; therefore, NUTS3 result to be unplanne omains, i.e. they may have small or even zero sample size. The current applie estimator, which is a generalize regression estimator (GREG), oes not guarantee reliable results for estimates for NUTS3. For this ISTAT, PSM/A, via Magenta 4, 0085 Roma, Italy{alo, iconsig, stfalors, solari}@istat.it

17 77 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation reason ifferent small area estimation methos have been taken into account in orer to improve the efficiency of provincial estimates. The performances of the estimators have been evaluate by means of a simulation stuy using Monte Carlo techniques in two ifferent informative contexts. The first framework is given by the stanar set of auxiliary variables use in ISTAT for househol surveys, that is counts of househol members crossclassifie by sex an age, the secon one consisting of the classification of househols by means of some socio-economic variables. In section we escribe the sampling strategy of CES, while in section 3 a brief account of the small area estimators uner stuy is given. Then, section 4 illustrates the empirical stuy an in section 5 the main outcomes of the stuy are reporte. Finally, in section 6 conclusive remarks are rawn.. Sampling Strategy.. Sampling Design The Italian poverty rate is estimate on ata collecte with CES. To clarify the sampling esign of the survey, first note that Italy is partitione in twenty regions (NUTS), each of them further ivie in provinces (NUTS3). At the time of the stuy the total number of provinces was 03. The sampling esign of CES is a two-stage stratifie sampling plan. The municipalities (primary sampling units or PSUs) are stratifie accoring to their size, measure in terms of number of inhabitants. From each stratum three municipalities are rawn proportionally to their size. Househols (seconary sampling units or SSUs) are selecte systematically in each sample municipality. Each member of the selecte househols is inclue in the sample. The sampling esign is efine at regional level, i.e. sample size is planne to obtain reliable regional estimates of consumer expeniture (by group of expeniture) an stratification is carrie out within regions without taking into account provincial bounaries. Therefore, since provinces o not represent planne omains, their sample sizes maybe very small or even zero... Target Variable The relative poverty rate is efine as the relative number of househols e whose equalise househol consumer expeniture q is below the level of the poverty line. The poverty line is given by the per capita consumer expeniture Q = N D M = i= q i,

18 STATISTICS IN TRANSITION, March where N is the population size, M is the number of househols in omain (regions or provinces) an q i is the consumer expeniture of househol i in area. The equalise househol consumer expeniture is obtaine iviing the househol consumer expeniture q by a coefficient function λ( a) epening on λ are: λ () = 0. 6, the househol size a. In particular, the utilise values of ( ) λ() = λ 3 =., λ( 6 ) =. 5, () 33, λ() 4 =. 63, λ( 5 ) =. 9 a 7 (ISTAT, 00). Then, the relative poverty rate in region is given by an λ( a) =. 4 for being Y = Y Q y = M i = I M M i = y i e ( q ; Q ) = M = 0 i = I q e i e if qi < Q i. otherwise ; Q, ().3. Direct Estimation of the Relative Poverty Rate Until 00, ISTAT isseminate annual estimates of the poverty rate only at national level an for three large geographical areas (North, Centre an South) using the GREG estimator (Deville an Särnal, 99), which allowe reasonable sampling errors for the omains of interest. The auxiliary variables were the cross-classification of sex an age. Starting from 00, ISTAT has isseminate estimates of the poverty rate also for the Italian regions. The results of the stuy, which has been unertaken for the selection of an estimator of the regional poverty rate (Di Consiglio et al., 003), have confirme the GREG estimator as the best choice in terms of traeoff between bias an mean square error (MSE). In particular, the GREG estimator, currently applie to prouce estimates of the poverty rate, is obtaine as a solution of a linear programming problem with constrains given by the known totals for the cross-classification of sex by age classes (0 4, 5 9, 30 59, 60 ). However, this estimator cannot guarantee satisfactory sampling errors when applie for the estimation of the relative poverty rate in provinces, which are unplanne omain for CES. The stuy in Di Consiglio et al. (003) represents the basis of the current work for the choice of an estimator an the selection of auxiliary variables for NUTS3 poverty rate estimation.

19 774 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation 3. Small Area Estimators A first analysis of the possible alternative small area estimators has been limite to a subset of the stanar small area estimators, which have been stuie in the EURAREA project (EURAREA Consortium 004, Project Reference Volume, In particular, esign base estimators such as irect an GREG estimators, an moel base estimators such as synthetic an EBLUP have been consiere. The latter class contains estimators base on both unit an area level mixe moels. Furthermore, an EBLUP base on unit level mixe moel with spatially correlate area effects, have been taken into account (Chambers et al., 004). Here, the small area estimators consiere in the analysis are briefly recalle, while for a full etaile account see Rao (003) or Heay et al. (004) for their elineation in the EURAREA project. The irect estimator is efine as follows where Y ˆ = Mˆ i s w w i is the sampling weight an i y i Mˆ, () = w i i s number of househols in area. The GREG is base on the stanar linear moel an it is given by Y ˆ GREG y = x β + e with e ~ ii N(0, σ ) i T i i = w X ˆ i y i + w M ˆ i s M i s i is an estimate of the total e i x i T βˆ, (3) with βˆ the weighte least square estimate of the regression coefficient vector β T an X = X,..., X ) the p-imensional vector of the covariates means. (,, p The synthetic SI_A an the EBLUP EB_A are base on a stanar linear mixe moel with unit-specific auxiliary variables, ranom area-specific effects an errors inepenently normally istribute i T i y = x β + u + e, u ~ ii N(0, σ ), e ~ ii N(0, σ ). (4) i The synthetic SI_A is expresse as u i e

20 STATISTICS IN TRANSITION, March ˆ SI_A while the EBLUP EB_A results to be y T Y = X β ˆ, (5) T ( ( X βˆ T y + x βˆ ) + ( ˆ γ ) X βˆ ˆ EB _ A T Y = ˆ γ, (6) ˆ γ ( ) the sample poverty rate an = ˆ σ ˆ ˆ u σ u + σ e n being. The estimation of the variance components are base on restricte maximum likelihoo (REML) (see Cressie, 99). Furthermore, the synthetic SI_B an the EBLUP EB_B estimators are base on a stanar linear mixe moel using area-specific auxiliary variables y = x β + ξ ξ ~ ii N(0, σ + σ n ). T The synthetic estimator SI_B is ˆ SI_ B T Y = X β ˆ (7) an the expression for the EBLUP EB_B, is ˆ EB _ B ˆ T Y = ˆ γ Y + ( ˆ γ )X β ˆ, (8) where ˆ γ ˆ σ ( ˆ σ + ˆ σ n ) = u u e an variance (see Heay et al., 004). u e ˆσ e is the poole estimator of the iniviual Finally, a spatial autocorrelation structure for the ranom effect component of the unit level moel (4) has been consiere y = X ' β + u + e, u ~ MN(0, σ A), e ~ MN(0, σ I ). (9) i i i u e N ' Setting δ ' = 0 if = an δ ' = otherwise, the matrix A epens on the istances among areas an on an unknown scale parameter, α, connecte to the spatial correlation among the areas A = ' '. α ' ist [ ] (, ) a = + δ exp The stanar case (4) is a special case of (9) with correlation-structure given by A=I D. The BLUP (B_SP) estimator for moel (9) is given by

21 776 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation Y B _ SP D ( ) ( ) [ ] ' = y + X x βˆ + M m y x βˆ τ,, (0) M ' = where, for each area, y is the number of househols below the poverty line in the sample, X is the vector of the known population totals of the auxiliary variables an x is the vector of sample totals of auxiliary variables. Moreover,, τ is the (, ) element of the matrix T * s s =, + ϕ [ ] = ' τ z z + ϕ A Diag[ n ] = A being ϕ = σ u σ e. The EBLUP EB_SP is obtaine by estimating the variance component by REML metho. For a etaile escription of the moel an the estimation algorithm see Chambers et al. (004). 4. Empirical Stuy 4.. Choice of Auxiliary Information A complete efinition of the estimators of paragraph 3.3, which are base on statistical moels, requires the specification of the auxiliary variables. In particular, the stuy escribe in this paper focuses on two ifferent sets of auxiliary information. The first set of variables taken into account (in the following referre to as Case A) is given by the number of househol members belonging to sex-age classes. These classes are efine as in the GREG estimator applie in ISTAT for the estimation of regional poverty rate (see par..). The secon set of auxiliary variables consists of a partition of the househols in homogenous groups with respect to equalise consumer expeniture by means of socio-economic variables. Among the most important variables use for the classification there are region, househol size, househol employment rate, age of the househol hea, maximum eucational level of the members. The specification of the homogenous group has been accomplishe by means of CART (Classification an Regression Trees- Breiman et al., 984) on CES ata (for the etail of the stuy see Di Consiglio et al., 003). The totals neee for the application of the small area estimators have been erive from ata collecte with the Labour Force Survey (LFS), a much larger scale survey whose sample size is esigne to guarantee pre-fixe sampling errors of provincial estimates.

22 STATISTICS IN TRANSITION, March Simulation Stuy In orer to compare the estimators of paragraph 3, a simulation stuy has been carrie out. The performances of the estimators have been evaluate by means of a Monte Carlo simulation stuy rawing R = 000 samples from a pseuo-population, accoring to the two-stage sample esign (municipalitieshousehols) applie to CES. The pseuo-population has been generate from CES atabases (997-00) an aministrative ata accoring to the following steps:. each recor of the atasets has been replicate a number of times equal to its sampling weight (ivie by five since the complete atabase consists of five ifferent CES samples);. the municipality population is create by rawing a number of recors equal to the municipality population size from the recors prouce in step within the stratum whom the municipality belongs to (municipality size given by aministrative recors on municipalities, year 00), so that the clustering structure of the population is preserve. For the evaluation of the small area estimators the software evelope by the EURAREA project has been applie (ownloaable at eurarea). The empirical istributions, assesse on the basis of the R replications, are use to evaluate bias an MSE of estimators, computing stanar synthetic evaluation criteria (e.g. for small area estimator comparison application see Rao an Chouhry, 998 an Higgins an Ralphs, 004). For each small area, the value of the parameter of interest Y is evaluate using the y i values in the pseuo-population. r Let r (r =,, R) be the generic replicate an Y ~ enote the r-th estimate of the poverty rate in area by means of the generic estimator Y ~. The evaluation criteria use in this stuy are the Relative Bias (RB) an Relative Root Mean Square Error (RRMSE) expresse respectively as ~ R r Y RB 00 = Y, R Y r= R Y ~ r Y RRMSE = R r = Y 00. A synthesis of the area level criteria can be obtaine by averaging the above criteria over the areas or uner a conservative approach taking the maximum value.

23 778 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation The first group of overall criteria is the average over all the areas of the absolute values of RB (Average Absolute Relative Bias, AARB) an the values of RRMSE (Average Relative Root Mean Square Error, ARRMSE) D AARB = RB, D = D ARRMSE = RRMSE. D The secon group of overall evaluation criteria is given by the maximum values of absolute RB (Maximum Absolute Relative Bias, MARB) an of RRMSE (Maximum Relative Root Mean Square Error, MRRMSE) over all the areas MARB = max RB, 4.. Analysis of the Results = MRRMSE = max RRMSE. In this section the results of the empirical stuy carrie out for the estimation of the Italian poverty rate at province level for the two sets of auxiliary variables is reporte. The analysis of table an table shows that, with the exception of the irect estimator an GREG, which perform better in term of bias, the EB_SP, which takes into account the spatial correlation of the area ranom effects, prouces the best results for all the evaluation criteria, especially with the auxiliary variables of Case A. In Case B, instea, the EB_A an EB_B, isplay the best results in terms of AARB an ARRMSE, though the maximum values of bias an MSE are greatly larger than the EB_SP case. The synthetic estimators SI_A an SI_B perform worst for both the sets of auxiliary variables. Generally, the spatial estimator gives the best results showing robustness with respect to the ifferent sets of covariates use to specify the moel. The performance of the estimators uner the two ifferent set of variables, Case A an Case B, can be compare also through the analysis of figures an. These figures isplay the RB an RRMSE for each province (province size on the x-axis), in Case A an Case B, respectively. The set of variables seems to be less relevant for GREG an EB_SP, both in terms of bias an in terms of errors, while ifferent behaviours can be etecte for the other estimators, in particular for the synthetic estimators. This evience confirms that the secon set of covariates results to be more correlate to the target variable than the first set.

24 STATISTICS IN TRANSITION, March Table. Case A: Relative bias, relative error: mean an maximum values ESTIMATOR AARB ARRMSE MARB MRRMSE DIRECT GREG SI_A SI_B EB_A EB_B EB _SP Table. Case B: Relative bias, relative error: mean an maximum values ESTIMATOR AARB ARRMSE MARB MRRMSE GREG SI_A SI_B EB_A EB_B EB _SP One of the features of the EB_SP is that the introuction of a spatial correlation structure among the area effects allows to grasp the spatial istribution of the phenomenon uner stuy. This is showe in figures 3 an 4, reporting the spatial istribution in classes of relative bias over the territory for all the estimators in Case A an Case B, respectively. In each map, five classes have been use, each of them having as extreme points the 0th-percentiles of the istribution of the relative bias of estimator examine. Different scales on the maps have been use in orer to show the strength of spatial correlation of relative bias for each estimator. Uner the hypothesis of absence of spatial correlation, the relative bias is expecte to be territorially ranomly istribute. This is plausible for the spatial estimator EB_SP in figure 3, while the other two estimators base on the unit level moel, SI_A an EB_A, show a clear spatial pattern. Furthermore, the other stanar estimators isplay a spatial pattern of the relative bias as well, even if less clear than what observe for SI_A an EB_ A. Analogous consierations are true for the set of variables of Case B, as shown in figure 4, though the spatial pattern is less evient than before. This is likely ue to the explicit use of a geographic variable (geographical region) for the efinition of the secon set of covariates, which allows a better explanation of the spatial istribution of the phenomenon uner stuy.

25 780 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation 6 Conclusions The spatial EBLUP EB_SP provies satisfactory results an it seems to be quite robust to the choice of auxiliary variables (this aspect, of course, turns out to be very important in absence of suitable external information). Further improvements can be expecte in the behaviour of estimators which make use of istances among areas, by consiering in the covariance structure more appropriate functions of istance. In fact, the EB_SP is base on Eucliean istance among areas, more suitable for physical than socio-economic phenomena. In this particular case, instea, it is likely that better results can be obtaine by means of ifferent metrics such as roaway or travel time istances among areas. Furthermore, SI_A an EB_A are base on the assumption of simple ranom sampling, but in our case, in fact, the sampling esign is a complex esign; pseuo-eblup may help improving estimation base on unit level moels. Finally, it can be also unerline that the results obtaine with the comparison of the performance of small area estimators for the estimation of poverty rate confirms the results on the application of the same methos for the estimation of the ILO Unemployment Rate at local labour market area level (see D Alò et al., 004). Figure. Relative Bias of the estimators in Case A an Case B.

26 STATISTICS IN TRANSITION, March Figure. Relative Root MSE of the estimators in Case A an Case B.

27 78 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation Figure 3. Relative Bias geographical istribution in Case A

28 STATISTICS IN TRANSITION, March Figure 4. Relative Bias geographical istribution in Case B

29 784 M.D Alo, L.Di Consiglio, S.Falorsi, F.Solari: Small Area Estimation REFERENCES BREIMAN L., FRIEDMAN J., OLSHEN R., STONE C. (984). Classification an Regression Trees, Wasworth, Belmont, CA. CHAMBERS R., SAEI A., FALORSI S., D ALÒ M., SOLARI F., RUSSO A., DJERF K., SOSTRA K, VEIJANEN A., NISSINEN K. LEHTONEN R, HEADY P., HIGGINS N.; RALPHS M. (004). Linear moels that borrow strength across time an space. In Project Reference Volume, Vol., C4.- C4.59, CRESSIE, N. (99). REML estimation in empirical Bayes smoothing of census unercount, Survey Methoology, Vol. 8, D Alò M., Di Consiglio L., Falorsi S., Solari F. (004). The impact of the auxiliary information in the estimation of unemployment rate at sub-regional level: further investigation on the Italian results in the EURAREA project. In Proceeings of the European Conference on Quality an Methoology in Official Statistics, Mainz, Germany, 4 6 May 004. DEVILLE J.C., SÄRNDAL C.E. (99). Calibration estimators in survey sampling, Journal of the American Statistical Association, Vol. 87, DI CONSIGLIO L., FALORSI S., PALADINI P, RIGHI P., SCAVALLI E., SOLARI F. (003). Stimatori per piccole aree per le stime i povertà regionali, Rivista i Statistica Ufficiale, n. /003, EURAREA CONSORTIUM (004). Project Reference Volume, Vol., HEADY P., HIGGINS N., RALPHS M., LONGFORD N. (004). Linear moelbase methos for small area estimation. In Project Reference Volume, Vol., C.-C.3, HIGGINS N, RALPHS M., (004). Practical evaluation criteria. In Project Reference Volume, Vol., B3.-Bb3.6, ISTAT, (00). La stima ufficiale ella povertà in Italia , Collana Argomenti, n. 4. RAO J.N.K, (003). Small Area Estimation, John Wiley & Sons, Hoboken, New Jersey. RAO, J.N.K, CHOUDHRY, G.H. (995). Small area estimation overview an empirical stuy. In Business Survey Methos, Wiley, New York,

30 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp ATTEMPTS TO ESTIMATE BASIC INFORMATION FOR SMALL BUSINESS IN POLAND Grazyna Dehnel, Elzbieta Golata ABSTRACT The paper presents first attempts to use aministrative ata sources an inirect estimation techniques to estimate basic economic information about small business in the joint cross-section of Polish Classification of Economic Activities PKD an voivoships. The stuy objective, specifie as accounting for an applying tax ata for a more effective use of a survey of small businesses with up to 9 employees, was unerstoo in a twofol manner. First of all, it was a verification of the hypothesis concerning the possibility of improving estimation precision in stuies available to ate. Seconly, it was intene as a possible extension of estimation scope by joint istribution by voivoship an economic activity (PKD ivision). The basic economic information, for the aim of this stuy, was limite to the employment an revenues. The Horvitz-Thompson estimates in the joint cross-sections of PKD an voivoships are presente an compare with the results of inirect: ratio synthetic, regression synthetic an composite estimates. The properties of the estimators are iscusse from the omain specific point of view an combining all omains. Estimation precision characterizing economic activity of small enterprises is presente an analyse for ifferent types of omains: PKD sections, regions an joint cross-section of regions an economic activity. Results obtaine in the stuy entitle to raw the following conclusion. Application of inirect estimation to small business ata requires consieration of the heterogeneity of its istribution. Nevertheless the results of the stuy present the practical possibilities an benefits of aopting the techniques of small area estimation to small business ata in Polan. Key wors: small omain estimation, tax register, small business statistics. Poznan University of Economics, Department of Statistics, [email protected], [email protected]

31 786 G.Dehnel, E.Golata: Attempts to Estimate Basic. Introuction Enterprises in Polan are ivie into three categories: big, meium an small epening on their size measure by the number of employees. Accoring to Polish regulation as big enterprises are consiere all those which employ at least 50 persons. Enterprises with the number of employees from 0 to 50 are referre to as meium. An all those employing up to 9 people (inclusive) are calle small business. Each category of enterprises is obligate to provie Central Statistical Office (CSO) with the information about their economic activity. But the scope of the information an the frequency of supplying the reports iffer between the three groups. All the big enterprises are oblige to provie two reports each month: DG about economic activity an F0 about costs, revenue an financial result. Aitional reports covering other topics are collecte quarterly or yearly (SP-). All the big enterprises are boun to participate in the statistical reporting, so the whole population is covere. But for the meium an small enterprises a sample survey is conucte. A sample covering 0% of meium enterprises is ranomly chosen an takes part in the statistical reporting. The units chosen to the sample provie similar reports as big entities. In the case of small business, the statistical reporting is obligatory only for a ranom sample of enterprises. The sample consists of 5% of economic units an the report SP-3 that is provie once a year covers all topics of economic activity. Because of volatility in the economy in transition, a great fluctuation in small business can be observe. It results in a large exten of nonresponse in the stuy which finally provies estimates mainly at national level (Zagozzinska, 996, Witkowska A. an Witkowski M., 997). Accoring to Central Statistical Office there were 3.35 million enterprises in Polan in 00. Majority of them, that is.6 million belonge to small business category, employing up to 9 people. Most of small economic entities, that is.545 million, are self-employe iniviuals. An only less than 4% of small business possesses legal corporate personality. Over 3. million people, that is about 0% of the Polish labour force is employe in small business. They operate mainly on local markets, 40% of them is running commercial activity. All the above points the importance of small business enterprises in the Polish economy, especially for local labour markets an regional evelopment (Pawlowska, 005, Wieczorek, 003). Although the small economic entities constitute the basis of local economy there is no information about this group of enterprises available at regional an local level. In this situation of the restricte scope of information, an iea arouse to use aministration recors an small area techniques to estimate information about small business at regional level (Paraysz, 004, Falorsi et al., 000).

32 STATISTICS IN TRANSITION, March The stuy was aime at taking into account an applying tax ata for a more effective use of a survey sampling SP-3 of small businesses employing up to 9 people. To achieve this aim, several more specific objectives an tasks were set:. Assessing the scope of applicability of small area estimation methos: an attempt to use synthetic ratio estimation using tax register as auxiliary ata source to make estimation on SP-3 survey ata across omains efine as regions (voivoships) an categories of economic activity (PKD) combine,an attempt to use synthetic regression estimation, an attempt to use composite estimation.. Analysing the precision of small omain estimation results in comparison with those obtaine by applying traitional methos. The stuy results outline in this article are part of a larger research project focuse on eveloping techniques to estimate economic information for small businesses an are still in their experimental stage. The analysis refers to ata for the year 00 from atabases mae available by Central Statistical Office. It concerns also the Ministry of Finance tax register, which in limite scope is now available to public statistics. It was the first time when information about small enterprises was extene by simultaneous usage of three ifferent ata sources. It is also the first attempt to make use of tax register ata in public statistics in Polan. Confientiality of the ata was specially secure. The stuy was carrie out in Statistical Office in Poznan in close cooperation between members of Centre for Regional Statistics. The main ata sources about small business units which were use in the stuy are (Paraysz an Klimanek, 006): SP-3 survey conucte by Central Statistical Office on a 5% ranom sample chosen using a stratifie sampling esign, Information from BJS Database of Statistical Units create by Central Statistical Office on the basis of economic units register calle REGON, Ministry of Finance s tax register ata. The SP-3 is a survey carrie out yearly on a 5% ranom sample of small enterprises employing up to 9 people. There were 4 thousan units chosen, but only respone (39,3% response rate). The information obtaine in the survey concern, among other, the following topics: costs, revenue, financial result, employment, investments etc. The Database of Statistical Units (BJS) is specially create in Central Statistical Office to serve as a frame in all surveys conucte by CSO on a population of economic enterprises. BJS is constructe on the basis of economic units register (REGON) an inclues aitional information from the other sources. It is upate every year an in 00 containe recors. The Ministry of Finance Tax Register containe information about tax statements for economic entities: recors for self-employe iniviuals (PIT) an recors for business that possess legal corporate personality (CIT).

33 788 G.Dehnel, E.Golata: Attempts to Estimate Basic Estimations performe in the course of the stuy were mae for three categories of omains efine accoring to the following cross-sections: Regions 6 omains ientifie as R, PKD sections of economic activity omains ientifie as S, Joint istribution by regions an PKD sections 76 omains ientifie as R/S. Accoring to the aministration ivision of the country the regions were efine as voivoships. The PKD sections are efine accoring to the Polish Classification of Economic Activities (PKD). The PKD classification was evelope in 990 (with further changes) on the basis of statistical classification of economic activity in European Union countries Nomenclature es Activities e Communate Europeenne (NACE). The PKD classification is upate without elay accoring to all the new regularities introuce by European Commission. On the basis of SP-3 ata, CSO publishes the following information on economic activity of small enterprises separately in the regions (omains referre to as R) an separately by PKD sections (omains referre to as S): number of economic entities, number of employees, costs an revenue. Due to small sample size an large percentage of nonresponse there is no estimates available for joint cross-section by regions an kin of economic activity (omains referre to as R / S). In this situation our stuy was aime at improving the estimation precision at regional level (R) an by PKD sections (S) an proviing reliable estimates for joint istribution by regions an PKD sections (R/S). As concerns the basic characteristics of economic activity, the following target variables were estimate: Mean number of employees uner a job contract per an enterprise, Mean monthly wage, Mean monthly revenue per one unit, Mean monthly costs per one unit. Consiering the limite scope of the paper an the nee to ensure comparability with the results obtaine using the finings of the EURAREA project, estimations presente in this article are given only for one selecte variable in selecte cross-sections (see: Paraysz an Klimanek, 006). The variable chosen for presentation is the mean monthly revenue per one unit. An the selecte cross-sections inclue all omains at the regional level (R) an PKD sections (S) as well as some omains in the joint istribution by regions an economic activity (R/S). For this presentation the two following PKD sections were chosen: D (manufacturing) an G (wholesale an retail trae, repair of motor vehicles, motorcycles an personal an househol goos) across all regions (voivoships). Other estimates were presente in a separate report for Central Statistical Office an are available at the Centre for Regional Statistics.

34 STATISTICS IN TRANSITION, March Synthetic ratio estimation We starte the project with the conviction about reliability an completeness of the tax register which was just mae available for public statistics. The first approach to the stuy was base on the assumption that gaining access to tax register ata shoul improve small business statistics at regional level consierably. It was assume that this particular source shoul be very reliable. Above all, its presume completeness was to guarantee access to information until then unavailable that is in the cross-section by PKD sections at regional level (R/S), possibly even at NUTS 4 level, i.e. poviats. During the first attempt to tap tax register ata, use was mae of the relationships observe in the register to e-aggregate SP-3 ata by joint istribution by regions an PKD categories (R/S). But in the course of the research it turne out that a consierable number of units rawn for the sample were not represente in the tax register. The atabase of tax recors - statistical PIT (personal income tax) provie by Central Statistical Office matche with those from atabase of statistical units BJS was seriously incomplete. In fact, the nearly recors obtaine from Central Statistical Office, i not inclue 4% of recors from the SP-3 survey. After matching recors from all three sources: BJS, tax register an SP-3 survey, the ultimate ataset has been reuce to about 6 thousans units. Nevertheless the attempt to e-aggregate Sp-3 ata was unertaken. The eaggregation was conucte on the assumption that the sum of total values for omains in the regions breakown is equal to the total value at the country level estimate by means of a irect estimator (see: Bracha, 003). Using irect estimates for regions (R) further e-aggregating was conucte across PKD ivision within particular regions (voivoships). At this stage synthetic ratio estimation was applie. Since no iniviual ata from tax register were available to match those from the survey, omain-level relationships were use instea, accoring to the following moification of synthetic estimation (see: Gosh an Rao, 994, Bracha, 996, pp. 60): Yˆ DIR Yˆ ' ( ) SYN R = X = r X = Yˆ (, ) ( DIR) f () where: X Y ˆ moifie synthetic ratio estimator of the total value in omain, ' ( SYN, R) X total value of an auxiliary variable in tax registers in omain, X total value of an auxiliary variable in tax registers at regional level (R),

35 790 G.Dehnel, E.Golata: Attempts to Estimate Basic Yˆ (DIR) r = the ratio of the irect estimate of the total value of the estimate X variable at regional level (R) (voivoship) to the total value of the auxiliary variable from tax register at the same level of aggregation (R), X f = the ratio of the total value of the auxiliary variable X in omain to X the total value of the auxiliary variable in the tax register at regional level, Y ˆ ( DIR ) the total value of the estimate variable base on irect estimation from SP-3 survey at regional level (R). An attempt to e-aggregate SP-3 survey total irect estimates for regions by PKD sections omains referre to as (R/S) was not successful. The resulting estimates, especially across small omains (of small representation) turne out to be unreliable. For this reason they are not inclue in this article. The subsequent ata analysis prouce the following conclusions:. An approach involving isaggregating irect estimates at regional level (R- voivoships) to the lower level omain PKD sections within region (R/S) is attractive in its simplicity.. In view of the incompleteness of tax register ata, fractions use as the basis for e-aggregation of the SP-3 survey ata can be biase. As a result, there is a serious anger that the resulting estimates are biase too. The extent of this bias cannot be etermine. 3. In estimating the number of employees an the mean monthly wage, BJS ata were use as the source of auxiliary information to construct the proportion for e-aggregation of the SP-3 survey ata. In this case, the extent to which the BJS atabase is up-to-ate is crucial to the reliability of the resulting estimates. The assumption concerning the completeness of tax register was not met. In this situation the e-aggregating of SP-3 ata using the relations observe in the incomplete tax register was in avance convicte to fail. It is beyon the scope of this paper an beyon our competence to iscuss the reasons of the eficiency of tax atabase. The reasons for the possible inaequacy of the incompleteness of tax registere ata may be ifferent. Data sets obtaine from the Ministry of Finance i not inclue information about all economic units. In particular, they in t contain 5% of units subject to tax reporting on the basis of the revenue an costs register an % of units subject to tax reporting on the basis of flat rate tax. It is also possible that some of the units selecte for the sample are self-employe iniviuals with limite scope of business operation an limite income who were not oblige to submit the PIT-5 tax report form an reporte their combine

36 STATISTICS IN TRANSITION, March income in an annual tax statement. The annual personal tax statement may not have been available for statistical purposes. Excluing a particular group of economic units can prouce a significant bias in the present stuy. For purposes of future estimation, it woul be extremely important for public statistics to have access to complete tax registers i.e. those incluing all units. As a result, one coul expect significant gains in the estimation precision an a reuction of possible bias. 3. Synthetic regression estimation Since the results obtaine in the e-aggregating approach were harly acceptable, it was necessary to use more refine methos of inirect estimation base on the moel approach. In aition the istribution of small companies by target variables occurre to be consierably skewe to the right, with high variation, high kurtosis an outliers. To tackle the problem, the following solutions were suggeste. One involves application of robust regression or logarithmic transformation in constructing the moels. Other propositions concern moving the analysis up from the unit level to the omain level: territorial units (R), PKD categories (S) or combine omains (R/S). To avoi further reuction of the sample, it was necessary to incorporate auxiliary variables at the omain level, i.e. to construct area level moels. For this purpose, synthetic regression estimation was use in two variations: aopting as explanatory variables mean values of variables from auxiliary ata sources: BJS atabase an the Ministry of Finance tax register, aopting irect estimates of the mean values for omains on the basis of the SP-3 survey as explanatory variables. To improve estimation precision concerning economic activity of small enterprises from SP-3 survey, aitional ata from the registers were use. As auxiliary ata sources we use: BJS register an tax register. Values of variables from an auxiliary ata source were treate as equivalent to those of the population in spite of the above iscusse inaequacies of the aministrative registers. The tax register contain the information giving the amount of: - revenue - income - costs - loss. The Database of Statistical Units (BJS) except ientification variables, contains only the information about the number of employees. Such a situation can involve units that o not bring profit exceeing the amount that is exempt from tax.

37 79 G.Dehnel, E.Golata: Attempts to Estimate Basic In the course of the stuy several versions of moels were constructe using various combinations of explanatory variables. In the situation of a restricte set of auxiliary information (5 variables liste above) an taking into account the moel evaluation, it was ecie to inclue all the available aitional ata as explanatory variables in the moels. Synthetic regression estimator can be presente in the following form: Y = X βˆ () ˆ ( SYN, REG) where: Y ˆ ( SYN, REG) synthetic regression estimator of the mean value of the estimate variable in omain, βˆ vector of regression coefficients, X the mean value of an explanatory variable for omain specifie on the basis of an auxiliary ata source (tax register or BJS). The variance of synthetic regression estimator V ( Yˆ ( SYN, REG) ) was estimate on the basis of Bracha s stuy (004, pp. 34 ff.). 4. Composite estimation Having obtaine estimates by means of a synthetic regression estimator a composite estimator was constructe. For each omain (separately for each breakown selecte) a linear combination was calculate for the irect estimate an a corresponing regression estimate obtaine from the moel. Choosing a weight escribing the contribution of each of the two components is a controversial question. One frequently aopte solution is γ = n N, others are iscusse in articles by Ghosh an Rao (994) an Holmoy an Thomsen (998). The optimum value of weight γ epens on the values of MSE of irect an synthetic regression estimators. They are unknown an shoul be estimate on the basis of the sample results. The composite estimator is a weighte mean of two estimators. One type of a composite estimator is prouce by combining a irect estimator with a synthetic estimator. This is one in orer to balance the synthetic estimator bias an the instability of the irect estimator. The iea of optimising the weightsγ which etermine the contribution of the two components comes own to minimising the MSE of the composite estimator Y ˆ ( COM ), assuming that cov( Y ˆ, ˆ ( ) Y (, ) ) = 0. Formally this can be written in the form: DIR SYN REG

38 STATISTICS IN TRANSITION, March Yˆ ( COM ) = γ Yˆ + ( γ ) Yˆ (3) ( DIR) ( SYN, REG) where: γ, 0 γ, is the weight within the range (0; ) keeping MSE ˆ ) own. ( Y (COM ) ˆ ( DIR ) ˆ ( SYN, REG) X Y the irect estimator of the estimate parameter from SP3 survey Y = βˆ synthetic regression estimator of the mean value of the estimate variable in omain. The weight γ obtaine with formula (see: Rao, 003, pp. 79, Koros, Paraysz, 000): V ( Yˆ ( SYN, REG) ) γ =, (4) [ V ( Yˆ ) ( ˆ ( DIR) + V Y ( SYN, REG) )] where: V ( Yˆ (DIR ) ) variance of the irect estimator Y ˆ ( DIR ), V ( Yˆ ( SYN, REG) ) variance of the synthetic regression estimator. Estimate variance of the composite estimator was obtaine using the variances (substituting their estimate values for unknown variances) of the components (Koros, Paraysz, 000): Vˆ( Yˆ ( COM ) Vˆ( Yˆ ˆ ( EXP) ) Vˆ( Y ( SYN, REG) ) ) =. (5) Vˆ( Yˆ ) + Vˆ( Yˆ ) ( EXP) ( SYN, REG) In assessing the quality of inirect estimation, use of eff coefficient as a relative measure of precision suggeste by L. Kish (003) was mae. The coefficient is efine by formula: Vˆ ( ) ( Yˆ COM ) eff Yˆ ( COM ) =. (6) Vˆ ( Yˆ DIR ) If the eff coefficient is less than, one can etermine the gain in estimation precision when the composite estimator is use in comparison with the irect estimator. 5. Empirical analysis We start presentation of the results with iscussion of the estimation of the average revenue per an enterprise in two cross-sections, that is by regions (R) an the ivision of economic activity (PKD sections S). This is the level of

39 794 G.Dehnel, E.Golata: Attempts to Estimate Basic aggregation for which the irect estimates are reliable an publishe by Central Statistical Office. (i) We can observe that the composite estimates are less ifferentiate than the irect ones. The highest irect estimate was obtaine for regions (7) an (3) (see tab.). What is interesting, both regions yiel the two extreme values of the REE: (7) the lowest equal to 6% an (3) the highest 37%. The result obtaine for region (3) is certainly influence by one of the smallest sample size in this region which cause ranom effects of a high amplitue. Application of the synthetic regression an composite estimators reuces the estimate revenue an the value of REE (from 37% to 8%). (ii) The ecrease of the REE obtaine for the composite estimator amounte even more than four times for the regions of the biggest REE of the irect estimator. For more precise irect estimates, the reuction was not that significant. (iii) The ifferentiation of the estimate average revenue per enterprise is much greater by types of economic activity (see tab. ). The top average revenue was notice for financial intermeiation an wholesale an retail. Application of synthetic an composite estimators give the shrinkage effect. Similarly as in the regions cross-section the highest value of the REE is observe for the extreme estimate of the average revenue that is for financial intermeiation REE= 5%. The composite estimator reuces this error to about 8%. For sections of large representation as manufacturing or wholesale, irect an composite estimators gave similar results an the estimation precision remaine almost unchange. But again great variation of the revenue by types of economic activity prouce significantly biase synthetic regression estimates. This entaile in a significant increase of REE for such ivisions as real estate or hotels an restaurants. Application of the composite estimator reuce REE for each ivision of economic activity, also for those with high REE for synthetic regression approach.

40 STATISTICS IN TRANSITION, March Table. Estimation of the average revenue per company by regions, Polan 00 Serial number Estimate (in thousans PLN) REE (%) of the region DIR COM DIR COM Source: Own calculations base on Central Statistical Office ata Table. Estimation of the average revenue per company by sections of economic activity, Polan 00 Estimate (in thousans DIVISION PLN) REE (%) DIR COM DIR COM Manufacturing Construction Wholesale an retail trae Hotels an restaurants Transport. storage an communication Financial intermeiation Real estate, renting an business activities Eucation Health an social work Other community, social an personal service activities Source: Own calculations base on Central Statistical Office ata Now we pass to the estimates which are not prouce by traitional methos an as unreliable are still not publishe by Central Statistical Office. The results

41 796 G.Dehnel, E.Golata: Attempts to Estimate Basic of inirect estimation of the average revenue per an enterprise applie to the joint cross-section of regions an economic activity (R/S) are presente in tables 3 an 4. Two sections were chosen for this presentation: D manufacturing an G wholesale an retail trae. These are the two sections of the highest sample representation in omains an for which the results are most reliable. But simultaneously these sections represent the smallest improvement of the estimation precision generate by inirect estimators. The irect estimates of the revenue are greater in wholesale an trae than in manufacturing (see tables 3 an 4 an graphs an ). But similarly the REE are much higher for wholesale an trae: 7.8% (region ) 44.9% (region 3). For section manufacturing the REE varies from % (region 5) to 3% (region 4). The composite estimation improves the precision. In a smaller exten for manufacturing: from 9.6% (region ) to 4.7% (region 4). The reuction observe in wholesale an trae section is of more intensive as REE takes the value from 4.8% (region ) till 8.6% (region 9). Table 3. Estimation of the average revenue per company section D: manufacturing by regions, Polan 00 Estimate (in thousans PLN) REE (%) Serial number of the region DIR COM DIR COM Source: Own calculations base on Central Statistical Office ata

42 STATISTICS IN TRANSITION, March Graph. REE of the average revenue per an enterprise section D: manufacturing by regions, Polan 00 0,4 0,3 REE 0, 0, Serial number of the region Source: Own calculations base on Central Statistical Office ata DIR COM Table 4. Estimation of the average revenue per company section G: wholesale an retail trae by regions, Polan 00 Serial number of the Estimate (in thousans PLN) REE (%) region DIR COM DIR COM Source: Own calculations base on Central Statistical Office ata

43 798 G.Dehnel, E.Golata: Attempts to Estimate Basic Graph. REE of the average revenue per an enterprise section G: wholesale an retail trae by regions, Polan 00 0,5 0,4 0,3 REE 0, 0, Serial number of the region DIR COM Source: Own calculations base on Central Statistical Office ata To summarize the above iscussion as well as the estimates obtaine for other variables in each of the analyze ivisions, the results were set in table 5. The table contains extreme values of relative estimation error REE obtaine for irect an composite estimators for all the four variables analyze. As it coul be expecte irect estimates give both extreme values much greater in comparison with composite estimation. The most significant gain in estimation precision concern the smallest omains that is the joint crosssection by regions an economic activity (R/S). This statement is vali for all variables estimate. But the gain in precision iffers epening on the variable. So the greatest improvement concern average number of employees an gross wage. The absolute ifference between extreme values of REE in case of employment amounts to 6 percentage points, an in case of average total amount of salaries pai, it is 56 percentage points.

44 STATISTICS IN TRANSITION, March Table 5. Estimation precision characterizing economic activity of small enterprises for ifferent types of omains *, Polan 00 Type of Domain Region (R) PKD Section (S) Region/Section (R/S) Variable REE DIR REE COM REE DIR REE COM REE DIR REE COM NUMBER OF EMPLOYEES MIN MAX AVERAGE GROSS WAGE MIN MAX AVERAGE REVENUE MIN MAX AVERAGE COSTS MIN MAX AVERAGE Remark: * for selecte PKD sections: D, F, G, K Source: Own calculations base on Central Statistical Office ata Table 6. Gain in estimation precision characterizing economic activity of small enterprises for ifferent types of omains, Polan 00 Serial number of the region Gross wage Number of employees Revenue Costs Manufacturing Wholesale Manufacturing Wholesale Manufacturing Wholesale Manufacturing Wholesale Source: Own calculations base on Central Statistical Office ata

45 800 G.Dehnel, E.Golata: Attempts to Estimate Basic Gain in estimation precision is usually analyze in terms eff ratio (Kalton, Brick, Lët, 005). These measures of effectiveness obtaine for small omain estimation in comparison with irect estimates show impressive gain in estimation precision. For joint cross-section by region an economic activity (R/S) the best results obtaine for composite estimator inform that in case of number of employees variance V ˆ( Yˆ COM ) is on average about 85% smaller than the variance of the irect estimator, while for revenue this average gain amounts to 5%. The average gain obtaine for omains ientifie as (R) or (S) is of course smaller but also great. In estimation of the number of employment when applying composite estimator with area level covariates instea of irect one the gain amounts, on average, to about 69% for regions (R) an 7% for PKD section of economic activity (S). In estimation of the revenue the average gain amounts to 55% an 6% respectively. It shoul be unerline that the results obtaine are not stable for type of omain an for the target variable. Conclusions The stuy leas to the following conclusions concerning possibilities to improve the precision of estimates of economic activity for small business at regional level (R), across kin of activity (S) an to exten the estimates for the joint istribution of regions an type of sections of economic activity (R/S):. The approach involving e-aggregating irect estimates for lower level omains is characterise by great simplicity yet synthetic estimations can be heavily biase ue to incompleteness of the tax register as an auxiliary ata source.. The Ministry of Finance tax register seems to be a reliable source of information. In orer for its effective use, however, it nees to be complete. Since over 40% of units sample coul not be matche with recors in the tax registers, there is a anger that the structure of the population surveye is going to be heavily istorte by error. Units that coul not be matche may constitute a specific group. In view of the incompleteness of tax register ata, fractions aopte as the basis of e-aggregation of SP-3 ata contain an error that cannot be etermine. 3. The results obtaine in the course of the stuy are not fully satisfactory but estimates are reasonable an encouraging. The regression estimates o not result in plainly wrong results that were observe when ratio estimation was applie. 4. Estimations results base on regression moels seem to be less biase in the regional cross-section (R) than those across PKD categories (S). Further stuies aopting regression estimation as well as EBLUP estimation shoul

46 STATISTICS IN TRANSITION, March focus on moelling the relation between the estimate an the auxiliary variables. Each cross-section shoul be moelle separately an base on inepth moel quality analysis. This requirement is a significant limitation in wier-scale applications. 5. Owing to the highly non-homogenous nature of istributions of the estimate variables, the relevant solutions shoul be further moifie e.g. by aopting a stratification of companies epening on their size measure by the number of employees: 0 employees ; 5 employees an 6 9 employees (Marker, 00). 6. Making use of composite estimator to balance the synthetic estimation bias an the instability of irect estimates improves estimation precision. The resulting gain increases with the ecreasing size of the sample in the omain an for lower omain level. 7. It is extremely important for future estimations that public statistics is grante access to complete tax registers as well as other aministrative registers incluing those containing juicial an social ata. This woul, hopefully, enable analysts to make significant improvements in the estimation precision an reuce the egree of error. 8. Estimation quality can also be enhance by tapping alternative sources of ata on consecutive years, which woul account for variation in response variables over time. There is a lot to say in favour of introucing elements of ynamics, especially in view of the high changeability of economic processes in the country as well as the methoology of regional research itself. The problems face uring the research stuy, shoul not result in terminating the works on applying aministration registers to improve small business statistics. The application of composite estimation resulte in istinct improvement in comparison with the e-aggregating approach an with synthetic estimates. The experiences gaine, suggest further work on fining more aequate small area estimation methos for small business characteristics at regional level with respect to the type of economic activity.

47 80 G.Dehnel, E.Golata: Attempts to Estimate Basic REFERENCES: BRACHA C. (996), Theoretical backgroun of survey sampling, Wyawnictwo Naukowe PWN, Warsaw, (in Polish). BRACHA C. (003), Estimation of Labour Force Survey ata for poviats, years , Central Statistical Office, Warsaw, (in Polish). BRACHA C., LEDNICKI B., WIECZORKOWSKI R. (004), Application of composite estimation methos to e-aggregate ata from Labour Force Survey, 003, Central Statistical Office, Warsaw, (in Polish). BROWN G., CHAMBERS R. HEADY P., HEASMAN D. (00), Evaluation of small area estimation methos an application to unemployment estimates from the UK LFS, Proceeings of Statistics Canaa Symposium 00, Achieving Data Quality in a Statistical Agency: a Methoological Perspective. GHOSH M., RAO J.N.K. (994), Small Area Estimation: An Appraisal, Statistical Science, Vol. 9, No.. FALORSI P. D., FALORSI S., RUSSO A., PALLARA S. (000), Small Domain Estimation Methos For Business Surveys, Statistics in Transition, June 000, Vol. 4, No.5, pp HOLMOY A.M.K., THOMSEN I. (998), Combining Data From Surveys an Aministrative Recor Systems. The Norwegian Experience, International Statistical Review, No.66, pp.0. KALTON G., BRICK J. M., LÊ T. (005), Estimating components of esign effects for use in sample esign, Househol Sample Surveys in Developing an Transition Countries, Stuies in Methos, Series F, No. 96, Statistics Division, Unite Nations, New York, pp. 94. KLIMANEK T., PARADYSZ J. (006), Aaptation of EURAREA experience in business statistics, Statistics in Transition, Vol.7, No. 4, pp.? KORDOS J., PARADYSZ J. (000), Some experiments in small area estimation in Polan, Statistics in Transition Vol. 4, No. 4., pp KISH L. (003), The hunre years wars of survey sampling, Statistics in Transition vol., Number 5, pp MARKER D.A. (00), Proucing small area estimates from national surveys: metho for minimizing use of inirect estimates, Survey Methoology, December 00, Vol. 7, No., pp.83 88, Statistics Canaa.

48 STATISTICS IN TRANSITION, March PARADYSZ J. (004), Proviing a regional statistics with support of small areas in a perspective of aministrative registers application, [in:] Wiaomosci Statystyczne, No. 3, pp. 9 (in Polish). PAWLOWSKA Z. (005), Role of small an meium enterprises in creating a eman on work, [in:] Wiaomosci Statystyczne, No., pp (in Polish). RAO J.N.K. (003), Small Area Estimation, Wiley Interscience, John Wiley an Sons, INC., Hoboken, New Jersey. WIECZOREK P. (003), Perspectives of small an meium enterprises after accession of Polan into the European Union, [in:] Wiaomosci Statystyczne, No., pp.4 34 (in Polish). WITKOWSKA A., WITKOWSKI M., (997), Usefulness of the existing statistical ata sources to analyse economic activity at regional level, [in:] Regional statistics. Survey sampling an integration of ata bases, e. Jan Paraysz, p Wyawnictwo Akaemii Ekonomicznej w Poznaniu, Poznan (in Polish). ZAGOZDZINSKA I. (996), Transformation of Enterprise Statistics, Wiaomosci Statystyczne, No., pp.0 (in Polish).

49 TATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp BALANCED AND COORDINATED SAMPLING DESIGNS FOR SMALL DOMAIN ESTIMATION Piero Demetrio Falorsi, Danilo Orsini, Paolo Righi ABSTRACT In the present work balance sampling an coorinate sampling, useful in orer to obtain planne sample size for omains belonging to ifferent partitions of the population, have been examine. In particular, these sampling methos may be applie to eal with small area problems in the phase of the sampling esign. The main avantages of the two techniques are the computational feasibility which permits to easily implement an overall strategy consiering jointly the esign an estimation phase an improving the efficiency of the estimators. An empirical simulation on real population ata an ifferent omain estimators shows the empirical properties of the examine sample esigns. Key wors: Planning Sample Size of Small Domains, Controlle Selection, Sample Coorination, Balance Sampling.. Introuction The small area problem is usually consiere to be treate via estimation. However, if the omain inicator variables are available for each unit in the population there are opportunities to be exploite at the survey esign stage. As note by Singh et al. (994), there is a nee to evelop an overall strategy that eals with small area problems, involving both planning sample esign an estimation aspects. In this framework it is crucial to control the sample size for each omain of interest, so that each omain is treate as a planne omain, at esign stage, for which it is possible to prouce irect estimates with a prefixe level of precision. In general, with a esign-base approach to the inference, the presence of sample units in each omain allows to compute omain estimates although not always reliable. Furthermore, in the moel-base or moel-assiste approach, the presence of sample units in each estimation omain permits to Italian National Institute of Statistics - ISTAT, Via Magenta 4, 0085 Roma, [email protected].

50 806 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate utilise moels with specific small area effects, allowing more accurate estimates of the parameters of interest at small area level (Lehtonen, et al., 003). For small omains it will often be neee to use inirect estimators (Rao, 003), such as synthetic estimators, for proucing reliable estimates; nevertheless, synthetic estimators are very sensitive to the moel assumption on which they rely on proviing a goo escription of the ata structure. For a true moel there will be no esign bias but if the moel oes not fit well for the whole omain structure there will be some esign bias, at least in a part of the ata, inflating the mean square error of the estimates. Of course, there will never be a true moel available in practice. In orer to overcome this problem it is ecisive to base the omain estimation on the sample units belonging to the small omain. The sampling esign techniques consiere in this paper allow controlling the sample sizes for omains of interest which are efine by ifferent partitions of the reference population. Such techniques are useful when the overall sample size is relatively small an, therefore, in some of the partitions there are small omains. In fact, when the aim of the survey is to prouce estimates for two or more partitions of the population, a stanar solution to obtain planne sample sizes for the omains of interest is to use a stratifie sample with the strata given by cross-classification of variables efining the ifferent partitions. In the following, this esign will be enote as cross-classification esign. In many practical situations, the crossclassification esign is often unfeasible since it nees the selection of at least a number of sampling units as large as the prouct of the number of categories of the stratification variables. In orer to overcome some problems of cross-classification esigns, an easy strategy is to rop one or more stratifying variables or to group some of the categories. Nevertheless, some planne omains become unplanne an some of them can have small or null sample size. Many methos have been propose in the literature to keep uner control the sample size in all the categories of the stratifying variables without using crossclassification esign. These methos, are generally referre as multi-way stratification techniques, an have been evelope uner two main approaches: methos base on selection accoring to Latin Squares or Latin Lattices schemes (Bryant et al., 960; Jessen, 970); methos base on controlle rouning problems via linear programming (Causey et al., 985; Sitter an Skinner, 994). The seminal paper of Bryant et al. (960) suggests to allocate the units in the sample by means of a two-way Latin Square table ranomly selecte an two estimators of the parameter of interest are propose. The metho implies that expecte sample counts in each stratum isplay inepenence between the rows an columns of the two-way table; in this way it can be use without any sort of limitations only when the stratifying variables are also inepenent in the population. Another rawback is that it is not possible to implement the proceure when there is no population in one or more cross-classification strata. In orer to

51 STATISTICS IN TRANSITION, March solve these problems, Jessen (970) proposes two approaches, both fairly complicate to implement an not always leaing to a solution (Causey et al., 985). As far as concerns the methos base on linear programming, Causey et al. (985) consier the controlle multi-way stratification as a rouning problem solve by means of transportation theory. The metho may not have a solution in case of three or more stratification variables. Following the linear programming approach to controlle sampling esigns propose by Rao an Nigam (990, 99), Sitter an Skinner (994) suggest a metho base on linear programming more flexible to ifferent situations than the metho propose by Causey et al. (985) an some further computational simplification of Sitter an Skinner metho have been suggeste by Lu an Sitter (00). Nevertheless, the main weakness of the linear programming approach is the computational complexity. As a consequence, the rawbacks of both approaches have limite the use of multi-way stratification techniques as a stanar solution for planning the survey sampling esigns. This paper illustrates as two well known sampling techniques may be useful for controlling the sample size in all the omains of interest without suffering from the isavantages of the above mentione methos. The first technique faces the problem by means of the selection of a balance sample (Deville, Tillé, 004) base on omain membership inicator variables. In this context the balancing equations assure that the prefixe marginal sample sizes are satisfie. The secon technique eals with the problem by means of proceures evelope to guarantee the sample coorination. The sampling units are selecte from separate stratifie sampling esigns, each one efine by a single auxiliary variable. The samples are selecte using permanent ranom number techniques (Ohlsson, 995) assuring the maximum overlap of the samples. The paper is organise as follows. Section introuces the essential notation. Section 3 is eicate to the balance sampling, while section 4 focuses on coorination techniques. Section 5 eals with the estimation topics, an section 6 shows the main results of a simulation stuy conucte on a real population of Italian enterprises. Finally some brief conclusions are unerline in section 7.. Notation an parameters of interest In orer to simplify the notation, the paper will be restricte to the case of two stratifying variables. However, the generalisation to the multi-way case is straightforwar. Let U efine the population of interest of size N. Suppose that two auxiliary variables V an V are known for each unit k (k=,, N) of the population with, respectively, R an C moalities. In particular, the generic value or class of values assume by V an V is enote respectively with v r an v c. Let

52 808 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate U rc = U r. U. c specify the subpopulation of size N rc 0, being U r. an U. c the subpopulations of size N r. = N c rc an N. c = N r rc for which r V = v an c V = v. Let y k efine the value of the target variable in the unit k ( k U ). There are R + C + parameters of interest efine by, Y = y k, Y r. = y k, ( r =, Κ, R), Y. c = k U k U r y k, ( c =, Κ, C).. k U. c In orer to obtain the sample estimates of the parameters of interest, the stanar cross-classification esign selects a stratifie sample s of size n from the population U with probability p(s), where the strata are obtaine combining the categories of the variables V an V. Let s rc ( r =, Κ, R, c =, Κ, C) be the subsample of s of size n rc, where min ( b, N ) n N rc rc rc, selecte from population U rc ; let b enote the minimum sample size fixe in avance, where b is equal to for proucing unbiase estimates of the parameters of interest or equal to for being able to compute unbiase variance estimates. Furthermore, n = { n rc } is the set of the strata frequencies, while n = ( n., Κ, n r., Κ, nr. )' an n = ( n., Κ, n. c, Κ, n. C )' are the marginal istributions of n with respect to V an V, being n = r. n c rc an n. c = n r rc. We will call marginal stratification with respect to V the partition of U in U., Κ, U r., Κ, U R., so that n r. is the sample size of subsample s Υ C r. = s c= rc of the generic marginal stratum r; while we will call marginal stratification with respect to V the partition of U in U., Κ, U. c, Κ, U. C, so that n. c is the sample size of subsample s Υ R. c = s r= rc of the generic marginal strata c. 3. Methos base on balance sampling Multi-way stratification esigns can be treate by means of balance sampling. The efinition of a balance sample epens on the assume inferential framework. In the moel base approach, a sample is efine as balance on a set of auxiliary variables if there is the equality between the sample an the known population means of the auxiliary variables (Royall an Herson, 973; Valliant et al., 000). Following the esign base approach consiere in this paper, a sample is balance when the Horvitz-Thompson estimates of the auxiliary

53 STATISTICS IN TRANSITION, March variables totals are equal to their known population totals (Deville an Tillé, 004). Balance sampling, in this last case, represents a generalisation of probability sampling esigns exploiting auxiliary information known for each unit of the population. Stanar esigns like simple esign with fixe size, stratifie sampling an unequal probability sampling represent some specific applications of balance esigns. The multi-way stratification esigns belong to this class of esigns too. 3.. General introuction to balance sampling For a formal escription of balance sampling, we introuce a general efinition of sampling esign. Sampling esign is a probability istribution p(. ), on the set S of all the subset s of the population U such that p( s) =, s S where p(s) is the probability of the sample s to be rawn. The set S may be represente by the ranom variable Λ that takes the value λ = ( λ,..., λ k,..., λ N )', with λ k = if k s an λ k = 0 otherwise. Let π = ( π,..., π k,..., π N ) be the vector of inclusion probabilities. We consier the sampling esign p(. ) with inclusion probabilities π which assigns a probability p(s) to each sample s such that E( λ ) = p( s)λ = π. Let s S x k = ( x k,..., x hk,..., x Hk )' be a vector of H auxiliary variables available for each population unit. The sampling esign p(s) is sai to be balance with respect to the H auxiliary variables if an only if it satisfies the balancing equations given by Xˆ 6 47 ir X 8 xk λk = xk, (3.) k U π k k U for all s S such that p(s)>0, where ir Xˆ is the Horvitz-Thompson estimator of the known population total X of the auxiliary variable. Two simple examples of balance sampling esigns are illustrate in the following. Example. The esign of fixe sample size n is balance on the auxiliary variable xk = π k for all s S. k U xk λk = = π k = n, π k k s k U

54 80 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate Example. The stratifie esign, where for each stratum U h of size N h a simple ranom sample of size n h is rawn, is balance on H auxiliary variables such that the generic h-th variable assumes the values in the k-th unit The balancing equations become: if k U h xhk =. 0 if k U h nh xhk N h λk = = k s π k k= nh k U for all s S an for h=,, H. x hk = N One of the main problems of balance sampling has always been implementing a general proceure which gives a multivariate balance ranom sample (see Valliant et al., 000). Recently, Deville an Tillé (004) propose the cube metho that allows the selection of balance (or approximately balance) samples for a large set of auxiliary variables an with respect to ifferent vectors of inclusion probabilities. The metho is base on at most two phases. In the first phase, calle flight phase, the cube metho searches for a sampling esign satisfying equations (3.). The sample selection is performe by means of rouning off ranomly to or 0 almost all the elements of the vector π. At the en of the flight phase the elements of π equal to or 0 inicate respectively that the corresponing unit is inclue or not in the sample. However, the balancing equations coul be exactly satisfie with some fractional element of π (the number of these elements are at most equal to the number of balancing variables), an therefore the flight phase oes not etermine whether the corresponing units have to be rawn or not in the sample. In these cases the cube metho searches for an approximately balance sample solution applying the laning phase; this solution relaxes some constraints, minimising the variance of ir Xˆ. The laning phase is set up as a rouning problem an it is solve via linear programming techniques. A simple example where the laning phase occurs it is the esign of fixe sample size when the sum of inclusion probabilities is not an integer. In example, the balancing equation is never satisfie with the flight phase if known total n is not an integer. As escribe below, in the multi-way stratification esign, the cube metho is able to select a balance sample only via flight phase (Deville an Tillé, 000); a brief escription of the algorithm is given in the appenix. In Chauvet an Tillé (006) is presente a faster algorithm for executing the flight phase implemente by means of a SAS-IML macro. The proceure is computationally feasible for h,

55 STATISTICS IN TRANSITION, March large ata sets with a lot of balancing variables as it has been teste by the authors. 3.. Balance sampling applie for marginal stratification Let us suppose that a vector of inclusion probabilities π, consistent with the marginal istribution n an n, is available, that is n = π ( r,... ) n = π ( c,... ). (3.) r. k U k = R r.. c C k U k =. c If π is not available, the well known Iterative Proportional Fitting (IPF, see Bishop et al., 975) or the Generalise Iterative Proportional Fitting (GIPF, Dykstra, 985; Dykstra an Wollan, 987) proceure can be use to efine it. Starting from an initial vector * π = (* π,...,* π k,...,* π N )' of inclusion probabilities such that n k U k = * π, IPF or GIPF ajust the initial probabilities computing the final vector π satisfying the marginal sample sizes constraints (3.). The main useful ifference between IPF an GIPF is that the secon proceure avois to efine inclusion probabilities greater than. Therefore, it coul be better to aopt GIPF to satisfy equations (3.). However, GIPF an IPF prouce the same results when the latter proceure oes not yiel a solution with elements of π greater than. A thir more empirical proceure to calculate π, satisfying the constraints (3.), is to use IPF proceure iteratively in the following way: starting from initial vector of inclusion probabilities * π, at the en of the proceure the π k s greater than are set equal to. Then the IPF proceure is reiterate with the same starting inclusion probabilities, removing the units with inclusion probabilities equal to an upating the marginal constraints so that, if the generic unit k belongs to the subpopulation U rc, the new marginal sample sizes became n r. an n. c. The selection by means of a balance esign, taking into account planne small sample size for two marginal istributions, is relate to the choice of the auxiliary variables. Then we use the vector of auxiliary variable where x r k x k = (,..., x,..., x, x..., x x ) π k if k U = 0 if k U r. r R c C x k k k k k,..., k r. x c k π k if k U = 0 if k U. c The flight phase of cube metho selects a ranom sample which satisfies the following constraints:. c.

56 8 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate ir Xˆ X r x k r λk = = x k = π k = nr π k U k k sr. k U k U r. c x c k λk = = x k = π k = n π k U k k s. c k U k U. c.. c, (r=,,r ; c=,,c ) (3.3) Equations (3.3) imply that the sample sizes of each subsample s r. an be equal to the fixe marginal frequencies n r. an s. c must n. c respectively. Deville an Tillé (000), show that equations (3.3) can be exactly satisfie by means of the cube metho via the flight phase. Finally we unerline some interesting aspects of balance sampling. The cube metho is quite general an it can be easily applie to more than two partitions in the omains of interest. Aitional balancing equations may be efine involving other variables correlate with stuy ones so as to improve the precision of estimators of totals. As far as concerns the estimation phase, the balance sampling can be use jointly with an estimator exploiting auxiliary information. This sampling strategy allows to boun the variability of the sampling weights (for instance calibrate weights) that causes problems, especially in small omains of estimate. For instance if known population counts are use in the calibration phase at omain level, it shoul be useful to utilize the same auxiliary information as constraints in the esign phase, in aition to the auxiliary variables which allow to respect the marginal sample sizes. 4. Methos base on coorinate sampling The problem uner stuy may be consiere as a particular case of sample coorination. Generally, coorinate sampling is aopte in orer to guarantee an expecte preefine overlap rate between two or more surveys. Minimum an maximum overlap is inicate respectively as negative an positive coorination. In our case, the sampling units are selecte from two separate sampling esigns on the same target population using respectively V an V as stratification variables. The two samples are maximum positively coorinate an the resulting final sample is compose by the union of the two samples. In this context the techniques can be classifie into: methos base on the Permanent Ranom Numbers (PRN); methos implementing linear programming algorithm (Ernst an Paben, 00).

57 STATISTICS IN TRANSITION, March Within the first techniques (Ohlsson, 995), a ranom number (PRN), rawn inepenently from the uniform istribution on the interval [0,], is assigne to each unit in the frame list. The PRN techniques allow to implement ifferent sampling esigns such as: simple ranom sampling without replacement, Bernoulli, Poisson an other probability proportional to size sampling proceures. Then, using the assigne PRNs, two maximum positively coorinate samples are selecte from the two sampling esigns. The secon type of methos, that will not be iscusse in this paper, are proceures maximizing the overlap among samples an are base on sequential application of transportation problems. The algorithms seem to be rather complex compare with permanent ranom numbers base proceures. 4.. General introuction to coorinate sampling using permanent ranom numbers In the PRN context, the sampling units are selecte from two separate sampling esigns, say p( ) ( ) an p ( ) ( ), on the same target population using V an V as stratification variables. Let s(i) an S (i) enote, respectively, the generic sample an the set of all the subset s (i) generate by the esign p ( i) ( ) (i=,). The inclusion probabilities of unit k in the marginal esign p ( i) ( ) are inicate with π ( i) k = p s S ( i) ( s( i) ) λ( i) k being ( i) λ(i) k a ummy variable ( i) that equals to if k s (i ), an equals to 0 otherwise. Two positively coorinate samples, say s() an s (), are selecte respectively accoring to p( ) ( ) an p ( ) ( ) esigns an the resulting final sample s is compose by the union of the two samples. Let us enote with z k the PRN assigne to the unit k (k=,, N) an for each above consiere esign let n = ( n., Κ, n r., Κ, nr. )' an n = ( n., Κ, n. c, Κ, n. C )' inicate the sample sizes, being R C n n n r r = c c = =. =., k π U () k = nr. an c r. k π U () k = n... c The final sample s, being s = s ( ) s(), is selecte with the joint esign p (s) an the corresponing inclusion probabilities are given by π k = Pr ( k s) = p s λ s ( ) S k, with λ k = if either λ () k or λ () k equals to. With these techniques, the realise sizes of the final sample are ranom outcomes with expecte values larger than the constraints n, n an n. Empirical results on Italian business surveys has shown that the overall realise sample size is generally larger than the prefixe n between 0% an 30%. In

58 84 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate orer to partially overcome this problem, in section 4.3 is shown an algorithm which, starting from initial * π () k an * π () k probabilities, calculates moifie inclusion probabilities, π () k an π () k. These guarantee that the expecte marginal sample sizes in the joint sample esign are roughly equal to the constrains n = ( n., Κ, n r., Κ, nr. )' an n = ( n., Κ, n.c, Κ n,. C )'. In the following section we will illustrate how coorination may be realise with some important esigns. 4.. Coorination with some sampling esigns Design I: simple ranom sampling without replacement in each marginal stratum With this esign we have π ( ) k = nr. Nr. for k U r. (r=,,r) an π ( ) k = n. c N. c for k U. c (c=,,c). The selection schema is very simple: for each esign, the frame units are orere by stratum an, in each stratum, by ascening orer of z k. In the stratum U r. of the esign p ( ) ( ), the sample is compose of the first n r. units in the orere list. In stratum U. c of the esign p ( ) ( ), the sample is compose of the first n. c units in the orere list. Ohlsson (99) presents a formal proof that this technique realises a sample in which the inclusion probabilities in s() an s () samples agree with the prefixe values, being Pr ( k s( ) ) = π () k = nr. N r. for k U r. an Pr ( k s( ) ) = π () k = n. c N. c for k U. c. The unit k is inclue in the overall sample s, if it is inclue in s () with probability π () k or if it is inclue in s () with probability π () k. There is no formal proof of the values assume by the inclusion probabilities π k. An intuitive reasoning suggests that π k max( π ( ) k, π () k ). Empirical simulations of coorinate sample selection with esign I has shown that the π k values are only slightly larger than max( π ( ) k, π ()k ). Design II: Poisson sampling With this esign we have an π () k = nr. gk gk for U r. k U r. k (r=,,r) (4.)

59 STATISTICS IN TRANSITION, March π () k = n. c gk gk for k U. c (c=,,c), (4.) k U. c where g k is a non-negative measure of size for the unit k. The unit k is inclue in the sample s (i) (i= or ) if z k π ( i) k. The inclusion probabilities of the final sample are, thus, π k = max( π ( ) k, π () k ). Design III: probability proportional to size without replacement This esign may be only approximate with PRN techniques. The two approximate solutions propose are orer sampling an collocate sampling. The joint esign of both techniques may be summarise as follows: (i) the expecte sample sizes are larger than the constraints n, n an n ; (ii) there is no formal proof of the values assume by the inclusion probabilities π k, but in practical situations they may be well approximate by π k = max( π ( ) k, π () k ). The orer sampling (Rosen 997a, 997b) represents an interesting way to avoi the perceive rawback given by ranom sample size of Poisson sampling an still comes very close to achieve the esire inclusion probabilities (4.) an (4.). A particular case of orer sampling is Pareto sampling, carrie out as follows: (i) for every k U, compute ξ( i) k = z k ( π ( i) k ) ( zk ) π ( i) k (i=,); (ii) for the sample esign p ( i) ( ) (i=,), the frame units are orere by stratum an in each stratum by ascening orer of ξ (i) k. In stratum U r. of the esign p ( ) ( ), the sample s () is compose of the first n r. units in the orere list an in stratum U. c of the esign p ( ) ( ), the sample s () is compose of the first n. c units in the orere list. Another case of orer sampling is the sequential Poisson sampling. The sampling selection is carrie out with steps, (i) an (ii) above illustrate, but the ξ (i)k values are efine by ξ ( i) k = ξk = z k / N gk. Both the above orer sampling schemas realise the marginal fixe sample sizes n an n for a given marginal esign s (i) (i=, or ); but the inclusion probabilities for p ( ) ( ) an p ( ) ( ) are slightly ifferent from the planne probabilities π () k an π () k expresse respectively by (4.) an (4.). With collocate sampling, the frame units are ranomly orere by stratum an in each stratum by ascening orer of z k, giving unit k ( k Urc ) the ranks

60 86 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate L r., k an L. c, k in the marginal strata U r. an esigns. Then the new variables r () k L ε r., k =, Nr. r () k L. c, k ε = N. c U of p ( ) an p ( ). c ( ) ( ) are compute whereε is a ranom number rawn inepenently from the uniform istribution on the interval [0, ]. The unit k is inclue in the sample s (i) (i= or ) if r ( i) k π ( i) k. For each marginal esign the collocate sampling is strictly pps with inclusion probabilities, expresse respectively by (4.) an (4.), for p ( ) ( ) an p ( ) ( ) esigns; but the realise sample has a variability lower than the one assure by the esign II. As a concluing remark we note that the methos base on PRN techniques are very easy to implement an for some esigns unbiase estimators of the variance may be compute, but esigns base on probability proportional to size without replacement may be only approximate, in the sense that in orer sampling the theoretical inclusion probabilities π (i) k (i= or ) may be only approximate an the collocate sampling oes not assure to attain the fixe sample sizes n an n An algorithm for controlling the expecte sample sizes Let * π ( ) = (* π (),...,* π () k,...,* π () N )' an * π ( ) = (* π (),...,* π () k,...,* π () N )' enote the initial vectors of inclusion probabilities of esigns p ( ) ( ) an p ( ) ( ), being * k π U () k = nr. an r. k * π U () k = n. c.. c Starting from initial *π () k an *π () k probabilities, the algorithm calculates moifie inclusion probabilities, π () k an π () k. These may be use as inclusion probabilities in the marginal esigns an they guarantee that the expecte marginal sample sizes of the joint sample esign are equal to constraints n = ( n., Κ, n r., Κ, nr. )' an n = ( n., Κ, n.c, Κ n,. C )'. The algorithm makes use of the conition π k = max( π ( ) k, π () k ) that is true for Poisson sampling an only approximate by other esigns. The algorithm is articulate in the following steps. Step 0. Initialization. Denote with t the iteration inex (t=0,, ). Let us pose at 0 t=0: π ( i ) k = *π () k (i= or ).

61 STATISTICS IN TRANSITION, March Step. Calculus of upate moifie probabilities. The upate moifie inclusion probabilities k i t ) ( π (i= or ), at iteration t (t=,, ), are obtaine by solving the following minimum constraints problem = = + = = + = + =,..., ; ) (,..., ; ) ( ) ( ) ; ( Min. () (). () () () () ) ( ) (.. ) ( C c n x x R r n x x n x x G c U k k t k t k t k t U k r k t k t k t k t U k k t k t k t k t i k i t k i t U k c r k i t π π π π π π π π π (4.3) in which: k i t k i t k i t k i t k i t k i t k i t G ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) / ln( ) ; ( π π π π π π π + = (i=, ) is the logarithmic istance function between k i t ) π ( an k i t ) ( π ; k t x is an inicator variable that equals if k t k t () () π π an equals 0 otherwise. Step. Iteration. With upate values k i t ) ( π (i= or ), let us calculate the upate values k t x + an then compute =. () (). ) ( U r k k t k t k t k t r t x x n π π (r=,, R) = U c k k t k t k t k t c t x x n. () (). ) ( π π (c=,, C). For a fixe close to zero small positive quantity, ν, if the following conition hols + = = ν.... c c t C c r r t R r n n n n (4.4) then the algorithm stops an the unit k is selecte with the moifie probabilities k t k ) ( ) ( π π = an k t k ) ( ) ( π π =, efine as solution of (4.3) at iteration t; otherwise step is iterate at t=t+ until conition (4.4) is respecte. Note that (4.3) represents a calibration process, that may be solve by the well known IPF (Bishop et al., 975) or GIPF (Dykstra, 985; Dykstra an

62 88 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate Wollan, 987) proceures. The logarithmic istance function avois to efine t t π ( i) k lower than 0, while GIPF prevents to obtain π ( i) k values larger than. A thir more empirical proceure to obtain π (i) k values simultaneously satisfying the (4.3) an lower than is the following: solve (4.3) with IPF proceure an set t t π ( i) k = π ( i) k if t π ( i) k >. 5. Estimation of omain totals In this section some consierations on the estimation are presente. In orer to simplify the notation of the estimators, we enote the populations U, U r. an U. c generically as U (=,,D=R+C+). Furthermore with analogous symbolism, we inicate with s an Y, respectively, the sample an the total of interest of the population U. The irect estimator of the parameter of interest is given by ir Yˆ = yk ak (5.) k s where a k are the base weights. If the probabilities π k are known (as in the case of balance sampling an Poisson sampling), the base weight is ak = / π k. Otherwise, if the π k values are unknown (as in the case of Pareto sampling an sequential Poisson sampling), we may approximate each of them as π k = max( π ( ) k, π () k ). Another solution may be to set the base weights a k equal to if k s() π () k a = + k if k s = () s() s() (5.) π () k π () k if k s() π () k where s () an s () enote respectively the set of units inclue only in sample s () an the set of units inclue only in sample s (). Aopting the weights efine in (5.), the estimator (5.) is unbiase as below illustrate:

63 STATISTICS IN TRANSITION, March E p ( Yˆ ) ir y = E k p δk + k s ) π () k k s() y π ( ( ) y + k y δ + k s k () π k s() π () k k k k () k k k = p k s k + E p k s k = δ δ () π () () k π () k δ δ k k + = E y y = ( Y + Y ) Y being δk = if unit k belongs to U an δ k = 0 otherwise an E p enotes the expectation over repeate sampling. Domain estimates coul be improve using auxiliary information an a superpopulation moel E m ( y k ) = f ( x k ; β) that, for each unit k, links the moel expecte value, E m ( y k ), to the auxiliary vector x k, where f(. ;β ) is a functional form epening by the vector β of unknown parameters. After obtaining the sample estimates βˆ of β, base on { y k, x k ; k s}, the preicte value y ˆ k = f ( x k ; βˆ) for each unit k U are compute, assuming that x k is known for each unit k in the population. Two ifferent estimator types, the Synthetic an the Generalise Regression, are consiere. Following Lehtonen et al. (003), the above estimators are expresse by: syny ˆ = yˆ k (5.3) greg U Y ˆ = yˆ + a ( y yˆ ). (5.4) U k s k k k An another estimator, often use in small area context, is the composite estimator expresse by compy ˆ = yˆ + U k γˆ a s k ( yk yˆ k ), where γˆ is a omain-specific weight appropriately constructe to have certain optimality properties; it approaches unit for increasingly large omain sample size, an it tens to zero for ecreasingly sample size. The preictions{ ŷ k ; k U} iffer from one moel specification to another, epening on the functional form an from the choice of the auxiliary variables. As illustrate in Särnal et al., (99, p. 399), aopting a linear moel, E m ( y k ) = x' k β, if each preiction is obtaine by means of a submoel efine on the population U, the estimators (5.3) an (5.4) agree. Moreover,

64 80 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate linear moel allows to efine the above estimators knowing only the omain totals of the auxiliary information an the x k values for the sampling units. However, knowing x k values for every k U it is possible to construct an estimators with more efficient preictions ŷ k obtaine by generalise linear moels (Lehtonen an Veijanen, 998) or non parametric regression techniques (Montanari an Ranalli, 003). Finally, let us point out that the synthetic estimator relies on the truth of the moel an it is usually biase; nevertheless, as note in Lehtonen et al. (003), if the moel incorporates a specific omain effect (fixe or ranom) the bias is strongly reuce an with the incorporation of omain-specific terms in the moel it is goo to control the sample size at omain level. 6. Monte Carlo experiment to evaluate the sampling strategies 6.. Backgroun information The simulation is carrie out on real enterprises ata. The analysis has been focuse to the 999 population of the enterprises from to 99 employers belonging to the Computer an relate economic activities (-igits of the NACE rev. classification). In orer to simplify the empirical analysis some units with outlier values have been elete. At the en of the cleaning proceure, the ata base use for the simulation stuy has N=0,39 enterprises. The value ae an labour cost are the variables of interest chosen in the simulation. The variable values are available for each unit in the population by an aministrative ata source. Accoring to the EU Council Regulation n 58/97 on Structural Business Statistics the estimation omains are efine as ifferent partition subsets of the population. In particular, we consier two partitions: () geographical region with 0 marginal omains, hereafter referre as DOM; () Economic activity group (3-igits of the NACE rev. classification with 6 ifferent groups) by Size class (efine in terms of number of persons employe: =-4; =5-9; 3=0-9; 4=0-99) with 4 marginal omains, hereafter referre as DOM. Therefore, the overall number of marginal omains is 44, while the number of the cross-classification strata is 480 but only 360 strata have one or more population units. The experiment is base on an overall sample size of 360 units, planning the sample size of each omain of DOM an DOM such that the coefficient of variation of the omain estimates for the variable number of employers (suppose known at sampling stage) is bounen to be lower than 34.5% for the DOM omains an to 8.7% for the DOM omains. This sample allocation represents a compromise between the Allocation Proportional to Population (APP) size an an allocation uniform for each omain of each partition. In the following we refer to the omains with the planne sample size greater than the APP sample size as oversize omains. These omains nee to have a sample size larger than the APP

65 STATISTICS IN TRANSITION, March sample size to boun the sampling errors; roughly speaking these omains may be classifie as small omains. The analysis of the empirical results will be focuse on these omains. Table reports the planne sample sizes an the APP sample sizes for omains of the DOM an DOM partitions. The small omains are reporte in grey cells. Table. Population an sample sizes for the omains of interest DOM Geographical Region Population size Planne sample size DOM APP 3-igit NACE Population sample size Rev. ; size size class Planne sample size APP sample size Piemonte ; 57 Valle 'Aosta 7 7; 9 6 Lombaria, ;3 3 Trentino A. A ;4 0 Veneto, ;, Friuli V. Giulia ; Liguria ; Emilia R ; Toscana ;, Umbria ;, Marche ; Lazio, ; Abruzzo ; 46 Molise ; Campania ;3 4 0 Puglia ; Basilicata 47 75; Calabria ; Sicilia ;3 6 5 Saregna ; Total 0, ; ; ; ; Total 0, A grey cell enotes a small omain In the simulation we have consiere seven sampling esigns as reporte in table. Five hunre Monte Carlo samples have been selecte for each sampling esign.

66 8 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate Table. Sampling Design use in the simulation stuy Sampling Design Stratifie on DOM with SRSWOR * in each stratum Stratifie on DOM with SRSWOR * in each stratum Balance sampling on the marginal sample sizes an on population sizes Balance sampling on the marginal sample sizes Coorinate Pareto sampling Coorinate Poisson sampling Coorinate Sequential Poisson sampling *SRSWOR: Simple Ranom Sampling Without Replacement Abbreviation STDOM STDOM BALPOP BAL CPAR CPOI CSPOI In the experiment STDOM an STDOM have planne sampling sizes for a single partition respectively DOM an DOM. BALPOP is base on (3.3) aing the balancing equations k δ s k π k = N (=,, 44), while in BAL only the equations (3.3) are consiere. Starting from initial probabilities * π k = n N, the inclusion probabilities for BALPOP an BAL have been attaine applying the IPF proceure iteratively (four times) as escribe in section 3.. In the coorinate esigns CPAR, CPOI an CSPOI, the marginal sample sizes are satisfie only as expectation over repeate sampling. Starting from initial probabilities * π () k = nr. Nr. for k U r. an * π () k = n. c N. c for k U. c, the final inclusion probabilities π () k an π () k have been obtaine by means of the algorithm escribe in section 4.; the final solution has been foun with eight iterations. For each sample the estimates of the omain totals have been compute by the irect, generalise regression an synthetic estimators. In particular, for the three coorinate sampling we have examine two irect estimators. The first one aopting as irect weight the value ak = ( max( π ( ) k, π () k )), while the a k weights in the secon irect estimator are efine by the expression (5.). We show the results only for the first irect estimator because the simulation stuy has stresse that the secon estimator has much higher variability than the first estimator. As far as concern the estimators using auxiliary information, two simple homoscheastic linear moels have been implemente: the moel () uses 0 auxiliary variables, six of them are the economic activity group membership inicators, an the remaining four are the size class membership inicators; the moel () uses the 44 omain membership inicator variables. The linear moel () is expresse by E m ( y k ) = β a + βb for k U a Ub,

67 STATISTICS IN TRANSITION, March U a is the population of enterprises of a-th (a=,, 6) economic activity where group an the number of employers an U b is the population of enterprises of b-th (b=,, 4) size class of β a an β b are the fixe effects of the a-th economic activity group an the b-th size class. The linear moel () is E m ( y k ) = β r + βc for k U r. U. c, where the subscripts r (r=,,0) enotes the generic omain belonging to the DOM partition an c (c=,,4) enotes the generic omain belonging to the DOM partition an β r an βc are the separate omain-specific effects. We point out that the main aim of the experiment is to compare ifferent sampling esigns using the same estimator. In this context the choice of the best moel oes not represent a central issue. Hence, we have consiere two quite general feasible moels that can be implemente in all situations of planne omains. The moel () is somewhat more reliable since the estimates of the regression parameters are base on large sample sizes; while in moel () it is possible to evaluate the effect of planning the omain sample sizes, although the estimates of each regression parameter are base on small sample sizes. Obviously, more flexible moel formulations coul be possible as escribe for instance in Lehtonen et al. (005). As pointe out in section 5, using the moel () the synthetic an the generalise regression estimators give ientical results. In the following each sampling strategy is inicate in short by the couple (is, est), where is inicates one of the 7 sample esigns referre in table an est assumes the categories ir, syn, an greg respectively for the estimators (5.), (5.3) an (5.4). We have compute two quality measures: the average Absolute Relative Bias ( ARB) an the average Relative Mean Square Error ( RMSE) expresse by [ ˆ i Y ( is) Y ] Y 00 ARB F ( is, est) = est, car( F) F i= [ ˆ i Y ( is) Y ] Y RMSE F ( is, est) = est car( F) F 500 i= enoting with: F a specific subset of the marginal omains; car(f) the carinality of F; ˆ esty i ( is) the i-th Monte Carlo sample estimate (i=,, 500) of the total Y in the strategy (is, est). In particular, F represents alternatively the subset of small omains of DOM, DOM or the overall set of small omains (of both DOM an DOM).

68 84 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate 6.. Simulation results The Monte Carlo simulation stuy highlights that the multi-way stratification techniques propose in this paper are able to take bias an variability uner control with respect to two benchmark strategies,collapsing one of the two stratification variables. The main results of the experiment referre to the small omains set are showe in table 3. The table is organise in four blocks: the first one illustrates the quality measures of the irect estimator; the secon an thir block are eicate respectively to the syn an greg estimators base on 0 auxiliary variables (moel ()); the forth block presents the results of syn or greg estimators base on the 44 omain membership inicator variables. We restrict the comments only on the value ae variable but similar consieration coul be expresse for the labour cost variable. In general, the comments are referre to the overall set of small omains. Examining firstly irect estimates, we observe the following. The two benchmark esigns (STDOM an STDOM) have an RMSE value for the unplanne omains equal to 48.8% an 07.49% respectively. These values cause the large RMSE values compute for the overall set of small omains an respectively equal to 0.74% an 55.3%. The STDOM shows better results than those attaine by STDOM. This fining is explaine by the fact that the STDOM stratification criterion is correlate with the variables of interest an takes uner control a larger number of small omain than the STDOM stratification. As far as concerns the overall set of small omains, the BALPOP is the more efficient esign, both in terms of ARB (.06%) an RMSE (3.58%), even if BAL is only slightly worse. The strategies aopting the coorinate sampling show worse values with respect to balance sampling but they perform better in terms of RMSE than benchmark strategies. In particular, CPAR esign has the smallest RMSE (44,60%) followe by CSPOI (46,5%) an CPOI (5,67%). Nevertheless, the ARB values are quite high an larger than that attaine by STDOM esign. These finings epen on DOM omains where the ARB values are greater than %.

69 STATISTICS IN TRANSITION, March Table 3. Comparison of small omains sampling strategies: Average Absolute Relative Bias ( ARB ) an Relative Mean Square Error ( RMSE ) Sampling Design Value Ae Labour Cost DOM DOM Overall DOM DOM Overall ARB RMSE ARB RMSE ARB RMSE ARB RMSE ARB RMSE ARB RMSE Direct estimator (block ) STDOM STDOM BALPOP BAL CPAR CPOI CSPOI Synthetic estimator with 0 auxiliary variables (block ) STDOM STDOM BALPOP BAL CPAR CPOI CSPOI Generalise regression estimator with 0 auxiliary variables (block 3) STDOM STDOM BALPOP BAL CPAR CPOI CSPOI Synthetic or Gen. regression estimator with 44 auxiliary variables (block 4) STDOM STDOM BALPOP BAL CPAR CPOI CSPOI Consiering the synthetic estimator base on 0 auxiliary variables, some issues may be pointe out. All esigns are characterize by a large bias. The STDOM has an ARB equal to 3.99% (although it has an unacceptable RMSE that is equal to 65,6%). The rest of the esigns have the ARB values higher than 8%. This evience gives a warning against the use of synthetic estimator.

70 86 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate The STDOM esign has the lowest RMSE (6,6%), because of a strong reuction of the DOM variance. However, the ARB value (0.34%) is the largest of all esigns. The behaviour of balance an coorinate esigns in terms of bias an variance are more or less equal. The BAL has the lowest ARB (8.33%) an RMSE (3.6%) values. The experimental results of the greg estimator in thir block of the table 3 suggest some consierations. All the esigns show strong improvements of the quality measures. In general, the ARB measure has a remarkable reuction with respect to the same inicator compute on the synthetic estimator. Only the STDOM still presents a high ARB value (7.40%). In the STDOM, the reuction of the bias is more than compensate from the increase of the variability. This prouces an RMSE equal to 34.05%. Both the balance an the coorinate esigns have goo performances, though the balance esigns are slightly better being the RMSE roughly equal to the 3%. The CPAR esign is the best coorinate sampling in terms of RMSE (5.8%) with a small ARB value (.37%). The CSPOI esign has the lowest ARB value (.6%) an boune RMSE value (6.00%). Therefore, these results highlight the equivalence of CPAR an CSPOI esigns. Finally, the syn or greg estimator base on 44 auxiliary variables show analogous results to the greg estimator base on 0 auxiliary variables. The balance esigns are the best with slight preference for the BALPOP sampling. The CPAR an CSPOI esigns are enough close to the balance esigns even if the bias increases. As general finings, the balance esigns seem to guarantee a best strategies to take uner control bias an variance of the overall set of the small omains. Like for balance esigns, the coorinate sampling have goo performances with respect to the benchmark esigns. In particular, CPAR an CSPOI are preferre to CPOI an coul be a vali alternative to balance sampling when using the syn an greg estimators. The conclusion is that for all blocks, BALPOP shows best overall performance with respect to bias an accuracy an in BALPOP, the strategy with greg estimation with the ten auxiliary variables (block 3) is a safe choice for both value ae an labour cost. Furthermore, the results show that the synthetic estimator of block must be consiere carefully because the bias can be unexpectely large an the square bias woul be the ominating part in the RMSE.

71 STATISTICS IN TRANSITION, March Conclusions In the present work balance sampling an coorinate sampling, useful in orer to obtain planne sample size for omains belonging to ifferent population partitions, have been examine. In particular, these sampling methos may be applie to eal with small area problems, in planning the sampling esign so as to have planne size for each omain. These methos are useful for efining an overall strategy consiering jointly the esign an estimation phase. That causes some avantages in the ifferent approaches to the finite population inference: in the esign-base framework it is possible to compute irect omain estimators with prefixe level of precision; while in the moel assiste or moel-base framework, the presence of sample units in each estimation omains permits to utilise moels with specific small area effects allowing a more accurate preictions of the parameters of interest at small area level. A straightforwar metho to obtain planne omains is to stratify combining all the variables, which efine the omains of interest. This metho has some problems, first of all it coul prouce a large overall sample size epening on the number of the strata. To solve this problem some techniques, generally referre as multi-way stratification esigns, have been propose. Nevertheless, the main weakness of these techniques is the computational complexity. This paper stuies in etail the use of the two sampling techniques, the balance an coorinate sampling esigns, to control the sample size in all the omains of interest without suffering from the computational complexity. The experimental results conucte on real business population ata, show the optimal behaviour of balance sampling with ifferent irect an inirect estimators. The Pareto sampling seems to be the best esign among the coorinate ones; when it is use jointly with estimators involving auxiliary information, it is slightly worse than the estimates compute by means of balance esigns. Furthermore, the strategies base on Pareto sampling may be useful if the sample survey nees to be coorinate over time. This may be an effective sampling strategy for many business surveys. Finally, we point out that the paper has been focuse on the feasibility of the sampling selection techniques. The topics connecte to the estimation of variance have been neglecte an will be treate in epth in future researches.

72 88 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate REFERENCES BISHOP Y., FIENBERG S., HOLLAND P. (975) Discrete Multivariate Analysis, MIT Press, Cambrige, MA. BRYANT E. C., HARTLEY H. O., JESSEN R. J. (960) Design an Estimation in Two-Way Stratification, Journal of the American Statistical Association, 55: CAUSEY B. D., COX L. H., ERNST L. R. (985) Applications Transportation Theory to Statistical Problem, Journal American Statistical Society, 80: CHAUVET G., TILLÉ Y. (006) A Fast Algorithm of Balance Sampling, to appear in Journal of Computational Statistics. DEVILLE J.-C., TILLÉ, Y. (000) Selection a several unequal probability samples from the same population, Journal of Statistical Planning an Inference, 86: DEVILLE J.-C., TILLÉ Y. (004) Efficient Balance Sampling: the Cube Metho, Biometrika, 9: DYKSTRA R. (985) An Iterative Proceure for Obtaining I-Projections onto the Intersection of Convex Sets, Annals of Probability, 3, DYKSTRA R., WOLLAN P. (987) Fining I-Projections Subject to a Finite Set of Linear Inequality Constraints, Applie Statistics, 36, ERNST L. R., PABEN S. P. (00) Maximizing an Minimizing Overlap When Selecting Any Number of Units per Stratum Simultaneously for Two Designs with Different Stratifications, Journal of Official Statistics, 8: JESSEN R. J. (970) Probability Sampling with Marginal Constraints, Journal American Statistical Society, 65: LEHTONEN R., VEIJANEN A. (998) Logistic Generalize Regression Estimators, Survey Methoology, 4: LEHTONEN R., SÄRNDAL C. E, VEIJANEN A. (003) The effect of Moel Choice in Estimation for Domains, Incluing Small Domains, Survey Methoology, : LEHTONEN R., SÄRNDAL C. E., VEIJANEN A. (005) Does the Moel Matter? Comparing Moel-Assiste an Moel-Depenent Estimators of Class Frequencies for Domains, Statistics In Transition, 7: LU W., SITTER R. R. (00) Multi-way Stratification by Linear Programming Mae Practical, Survey Methoology, :

73 STATISTICS IN TRANSITION, March MONTANARI G. E., RANALLI M.G. (003) Nonparametric Methos in Survey Sampling, in: New Developments in Classification an Data Analysis, Vinci M., Monari P., Mignani S., Montanari A. (es), Springer, Berlin. OHLSSON E. (995) Coorination of Samples using Permanent Ranom Numbers, in: Business Survey Methos, Cox B. G., Biner D. A., Chinnappa B. N., Chirstianson A., College M. J., Kott, P.S. (es), Wiley, New York, RAO J. N. K. (003) Small Area Estimation, Wiley, New York. RAO J. N. K., NIGAM A. K. (990) Optimal Controlle Sampling Design, Biometrika, 77: RAO J. N. K., NIGAM A. K. (99) Optimal Controlle Sampling: a Unifying Approach, International Statistical Review, 60: ROSÉN B. (997a) Asymptotic Theory for Orer Sampling, Journal of Statistical Planning an Inference, 6, ROSÉN B. (997b) On Sampling with Probability Proportional to Size, Journal of Statistical Planning an Inference, 6, ROYALL R., HERSON J. (973) Robust Estimation in Finite Population, Journal American Statistical Society, 68: SÄRNDAL C. E., SWENSSON B., WRETMAN, J. (99) Moel Assiste Survey Sampling, Springer-Verlag, New York. SINGH M. P., GAMBINO J., MANTEL H. J. (994) Issues an Strategies for Small Area Data, Survey Methoology, 0: 3. SITTER R. R., SKINNER C. J. (994) Multi-way Stratification by Linear Programming, Survey Methoology, 0: VALLIANT R., DORFMAN A. H., ROYALL R. M. (000) Finite Population Sampling an Inference: A Preiction Approach, Wiley, New York.

74 830 P.D. Falorsi, D.Orsini, P.Righi: Balance an Coorinate Appenix: an algorithm for the flight phase of the cube metho In the following we give a brief escription of flight phase of the cube metho. First we set π ( 0) = π, next at time t=,, T repeat the following steps: Step. Generate any vector u ( t)' = ( u( t),..., uk ( t),..., u N ( t))', ranom or not, such that: (i) u k (t) =0 if π k (t) is equal to or 0 an u k ( t) 0 otherwise; (ii) (t) D' Diag( π) where, ' x... D = x' k ;... x' N * u is the kernel of the matrix A= [ ] Diag( * t π π ) = 0... π k π N Step. Compute λ ( t ) an λ ( ), the largest values of λ ( t ) an λ ( t) such that 0 π( t) + λ ( t) u( t), an 0 π( t) λ ( t) u( t). Step 3. Select π( t) + λ π( t + ) = π( t) λ * * ( t) u( t) ( t) u( t) with with * probability λ ( t) * probability λ ( t) * ( λ ( λ ( t) + λ * * * ( t) + λ ( t)) ( t)). The proceure is iterate until it is not possible to carry out the step.

75 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp ESTIMATION OF A DOMAIN TOTAL UNDER NONRESPONSE USING DOUBLE SAMPLING Wojciech Gamrot ABSTRACT The well-known two-phase (or ouble) sampling proceure propose by Hansen an Hurwitz (949) as an antiote for the nonresponse bias relies on subsampling survey nonresponents. For each subsample unit efforts are repeate to collect the ata it faile to provie in the initial phase of the survey. If there is a complete response in the secon phase then unbiase estimates of population parameters may be constructe. Otherwise, if there is an incomplete response in the secon phase, then the construction of estimators that utilize this ata allows to reuce the nonresponse bias. Several generalizations of this proceure have been propose incluing the application of arbitrary sampling esigns in both phases (see Särnal et al.(99)) an the use of auxiliary information (see e.g. Rao(986), Okafor(994), Gamrot(003,004)) to construct estimates of population parameters. In this paper, the application of the two-phase sampling proceure to estimate the omain total is iscusse. Available auxiliary information is use to construct ratio estimators uner this sampling scheme. Three estimators of a omain total are consiere an their properties are erive. Finally an attempt is mae to compare their precision. Key wors: nonresponse, ouble sampling, two-phase sampling, omain total. Introuction Consier the finite an fixe population uner stuy U of size N ivie into D omains: U,..., U D of sizes N,...,N D respectively. Domain membership of any population unit is unknown before sampling. It is assume that omains are quite large. For a typical -th omain U, several characteristics may be efine incluing the omain total: Y = () U y i i U

76 83 W.Gamrot: Estimation of a Domain Total omain mean: omain variance: Y U = yi () N i U S (3) U (X) = (x i X U ) N i U an omain covariance between two characteristics X an Y: C U (X,Y) = (x i X U )(y i YU ) (4) N In this paper the estimation of a omain total is consiere. The notation introuce above will also be use for other population/omain subsets, by replacing the omain symbol U with other subset symbols (e.g. Y U will represent the mean value of Y in some subset U of the population). It will also be use for other omain characteristics (e.g. X U will represent a omain total for some character X). Let us enote: U = U ; (5) U i U for =...D. N W = ; (6) N W N N = ; (7) N. Two-phase sampling: Consier the following two-phase sampling proceure. In the first phase of the survey a ranom sample s of size n is rawn from U, accoring to some sampling esign p (s), characterize by inclusion probabilities of the first an secon orer respectively enote by π i for i U an π ij for i j U. We assume that n may vary from sample to sample. Stochastic nonresponse is assume, as in the paper of Cassel et al. (983), which means that each i-th population unit has some conitional probability ρ i of responing if it is inclue in the sample. We also enote the conitional probability of i-th an j-th units responing simultaneously, if inclue in the sample s, by ρ ij. The sample s splits into two non-overlapping ranom subsets, s

77 STATISTICS IN TRANSITION, March an s, of sizes n an n, such that units from s respon, an units from s o not. Nonresponse may be treate as an aitional phase of sample selection, governe by some unknown probability istribution q(s s) that Särnal et al (99) call the response istribution. After the nonresponse has occurre, the secon phase of the survey is carrie out, an the subsample s of the size n is rawn from s, accoring to another sampling scheme p' (s' s,s ) characterize by inclusion probabilities of the first an secon orer respectively enote by π i s,s, π ij s,s for i j s. Then the secon attempt is mae to collect the ata from subsample units. Complete response is assume in the secon phase which means that response probabilities ρ i an ρ ij correspon only to the first phase. After the ata collection, the omain membership of units is ientifie which allows to etermine following sample subsets: s = s (8) s U s' s U = (9) = s' (0) U The above state assumption of a omain being quite large is necessary for these subsets being empty with negligible probability. In the framework introuce above, three sources of ranomness are efine, associate with probability istributions: p (s), q(s s) an p' (s' s,s ) respectively. All expectations will be calculate with respect to these three probability istributions, unless otherwise state. 3. Estimation of a omain total Let us introuce a variable Y, taking values: yi if i U y i = () 0 if i U The omain total of Y may be expresse as a population total of Y : Y U = i U y i = an it is unbiasely estimate by the statistic (see Särnal et al. (99)): i U y i i s i i s ' i i s, s () yi yi ŷ π = + (3) π π π

78 834 W.Gamrot: Estimation of a Domain Total which is equivalent to: yi yi ŷ π = + (4) π π π i s i i s i i s, s Accoring to Särnal et al. (99), its variance may be expresse as: V π ij yiyj πij s,s + π = yiyj E pq ; (5) i,j U πiπ j i,j s π π π π i j i s,s j s, s ( ŷ ) or equivalently: V π ij yiy j πij s,s + π = yiy j E pq ; (6) i,j U π π π π π π i j i,j s i j i s,s j s, s ( ŷ ) The symbol E pq ( ) represents the expectation with respect to the first-phase sampling esign p( ) an response istribution q(s s). Assumptions concerning inclusion probabilities in the secon phase are neee to eliminate this expectation operator. When y i = for i U we also obtain an unbiase estimator of the omain size N = in the form: i U y y Nˆ = + = + ; (7) i i i s π π π π π π i i s' i i s, s i s i i s i i s, s 4. A special case We will now consier a special case of the proceure escribe above. Let us assume eterministic nonresponse moel. This means that population is ivie into two non-overlapping strata U () an U (), such that units from U () always respon when contacte, whereas units from U () always fail to provie answers. Consequently we have: ρi = for i U (8) ρi = 0 for i U an ρ = ρ ρ for i, j U. Let us efine subsets: ij i j U U U U = (9) U U = (0)

79 STATISTICS IN TRANSITION, March of sizes N, N with W = N / N(), W = N / N (). Moreover, assume that the sample s an subsample s are rawn accoring to simple ranom sampling without replacement scheme, characterize by inclusion probabilities: n π i = ; () N n(n ) π ij = ; () N(N ) π = n' i s,s n ; (3) n' (n' ) π ij s,s = ; (4) n (n ) for i,j U, with subsample size n ' being the following linear function of the sample nonresponent subset size: n ' = cn ; (5) where 0<c< is a constant fixe in avance. Uner these assumptions, the estimator (4) takes the form: N ŷ π srs = yi + yi ; (6) n i s c i s From (6) the exact variance formula is: V f N N π srs U U n N N c W N N + N W + SU (Y) W W Y U ;(7) c n N N ( ŷ ) = N W S (Y) + W W Y + Assuming N>>, N >>, N >>, N >>, we obtain simpler, approximate expression: AV f n c W c n ( ŷ ) = N ( W S (Y) + W W Y ) + N ( W S (Y) W W Y ); π srs U U U + The first term in (8) is equivalent to the variance of the estimator compute from the first phase sample uner full response (see Särnal et al (99) pp. 39 expr ). The secon term represents the increase in estimator s variability ue to nonresponse an introuction of secon phase. Uner simple ranom U (8)

80 836 W.Gamrot: Estimation of a Domain Total sampling without replacement the estimator Nˆ π of the omain size N takes the form: N Nˆ = + π srs n n ; (9) n c where n an n represent the sizes of the sets s an s respectively. 5. Ratio estimator Let us consier the following superpopulation moel ξ, involving single auxiliary characteristic X, taking values x,...,x N : E ξ (Y i ) = βx i (30) Vξ (Y i ) = σx i where i U an =...D. Application of general least squares to the -th omain leas to the following estimator of the parameter β : YU β ˆ = ; (3) X U Replacing omain totals of X an Y in (3) with their respective estimators calculate accoring to (4) we construct the following estimator of omain total: ŷ ŷ X π ratio = U xˆ ; (3) π Its first-orer bias obtaine using Taylor linearization technique is nil which suggests approximate unbiaseness. Its secon-orer bias may be expresse in the form: π ij B ( ŷ ) ( ) ratio x i R x j y j + N X U i,j U π π i j x i (R x j y j) πij s + E pq ; (33) N X U i,j s π π π π i j i s j s where R = Y / X. The approximate variance is: U U AV ŷ π πiπ ij ( ) ( R x y )( R x y ) + = ratio i i j j i,j U j

81 STATISTICS IN TRANSITION, March ( R x y )( R x y ) i i j j πij s + E pq. (34) i,j s π π π π i j i s j s Let us consier again the special case of eterministic nonresponse an simple ranom sampling without replacement, introuce earlier. Uner these assumptions we obtain: ŷ π srs ŷratio srs = X U xˆ ; (35) π srs where xˆ π srs an ŷ π srs are the estimators of X U an accoring to (6). From (33) the secon-orer bias is: N f B ŷratio srs R SU (X) C (X,Y U X n NWW + nw X U ( ) ( )+ c c ) U Y U calculate ( R S (X) C (X,Y) + W X (R R )) U U U ; (36) where R =. From (34) the approximate variance is: YU / X U AV c W W c n f n ( ŷ ) = N W S (R X Y + ratio srs U ) ( S (R X Y) + W X ( R R ) ) + N U U ; (37) Both the bias an the variance iminish when initial sample size n grows. For U =U the estimator (35) an its properties reuce to forms which are similar to expressions erive by Rao (986), for the case of population mean estimation. 6. Ratio estimator uner full response on X Consier again the moel (30). If there is full response on the character X, then we can construct the following estimator of Y s omain total: ŷ π ŷratio = X U xˆ ; (38) HT where: x i x i xˆ HT = = ; (39) π π i s i i s i

82 838 W.Gamrot: Estimation of a Domain Total is the well-known Horvitz-Thompson estimator. Its first orer bias obtaine using Taylor linearization is nil. The secon-orer bias is: π ( ) ( ) ij B ŷ ratio x i R x j y j ; (40) N XU i,j U π π i j an the approximate variance is: AV ŷ π ij = + E pq i,j U π π i j i,j s ( ratio ) ( R x i yi )( R x j y j ) y iy j πij s π π i j πi s π j s ; (4) For the special case of eterministic nonresponse an simple ranom sampling without replacement the estimator (38) takes form: W XU ŷratio srs = ŷ π srs ; (4) w X From (40) its secon orer bias is: B s N f ( ŷ ) ( R S (X) C (X,Y)) ratio ; (43) srs U U X n U An from (4) its approximate variance is: AV f c W W srs U U + ; (44) U n c n ( ŷ ) N W S (R X Y) + N (S (Y) W Y ) ratio Both bias an variance are ecreasing functions of initial sample size n. For U =U the simplifie estimator (4) an its properties reuce to expressions which are similar to those obtaine by Rao (986), for the case of population mean estimation. 7. The comparison of estimators It is easy to prove that AV(ŷratio srs ) < AV(ŷratio srs ) if an only if: R C (X, Y) + W Y X U U U < ; (45) SU (X) + W X U Moreover, AV(ŷratio srs ) < AV(ŷπ srs ) if an only if: cvu (X) W ρ U (X,Y) > ; (46) cv (Y) cv (X) U U

83 STATISTICS IN TRANSITION, March In the expression above ρ (X,Y) C (X,Y) / S (X)S (Y) is the omain U = U correlation coefficient between X an Y, whereas cv U (X) an cv U (Y) represent coefficients of variation of X an Y respectively. Consequently, if both conitions (45) an (46) are satisfie, then AV(ŷratio srs ) < AV(ŷπ srs ). On the other han, if both conitions (45) an (46) are not met, then AV(ŷratio srs ) AV(ŷπ srs ). Let us consier a simple case of auxiliary characteristic X taking values x i = for i U. Ratio estimators (35) an (4) respectively take the form: * N ŷ ratio srs = ŷ π srs ; (47) Nˆ π srs an * W ŷ ratio srs = ŷ π srs ; (48) w Where w =n /n is a sample fraction of omain units. If N <N, then it is easy to show that ŷ * ratio srs has lower approximate variance than ŷ π srs. Moreover, if Y U < Y U, which shoul usually be the case, then the conition (45) hols an the estimator ŷ * ratio srs has even lower approximate variance than ŷ * ratio srs. Consequently, even for the trivial case of auxiliary variable being a constant, the propose ratio estimators are an attractive alternative to the stanar estimator ŷ π srs. U U 8. Conclusions In this paper the problem of estimating the omain total in finite an fixe population was consiere. Three estimators of a omain total were propose for two-phase sampling uner nonresponse incluing the stanar estimator that uses only observe values of variable uner stuy an two ratio estimators making use of available auxiliary information. Secon-orer biases an approximate variances were erive for each estimator in the general case of arbitrary sampling esign in both phases an arbitrary response istribution as well as in the special case of simple ranom sampling without replacement an eterministic nonresponse. These properties were evaluate without making any moel assumptions about istributions of variables. The superpopulation moel was introuce solely to motivate the construction of ratio estimators. Finally, approximate variances of estimators were

84 840 W.Gamrot: Estimation of a Domain Total compare for eterministic nonresponse. The sufficient conitions for each estimator being more precise than others were given. It shoul be emphasize, that several other superpopulation moels may be use as a founation for constructing ratio or regression estimators, incluing those involving multivariate auxiliary information an/or strength-borrowing (e.g. moel with slope parameters β equal in the population or some subgroup of omains). These alternative approaches may lea to even better results in terms of bias or variance. Finally, it shoul be stresse that estimation techniques consiere in this paper may be use provie that the omain size is quite large which ensures that intersection of responent subset an omain as well as intersection of subsample an the omain are nonempty. REFERENCES CASSEL C.M. SÄRNDAL C.E. WRETMAN J.H. (983) Some Uses of Statistical Moels in Connection with the Nonresponse Problem, in: Incomplete Data in Sample Surveys W.G. Maow I.Olkin (es.) Acaemic Press New York. GAMROT W. (003) On Some Ratio Estimators in Two-Phase Sampling for Nonresponse. in: Metoa Reprezentacyjna w Baaniach Ekonomiczno Społecznych. Cz. II. J.Wywiał (e.) Prace Naukowe AE Katowice, GAMROT W. (004) On Application of a Certain Classification Proceure to Mean Value Estimation Uner Double Sampling for Nonresponse, in: Daniel Baier an Klaus-Dieter Wernecke (es.): Innovations in Classification, Data Science, an Information Systems. Proc. 7th Annual GfKl Conference, University of Cottbus, March 4, 003. Springer- Verlag, Heielberg-Berlin, 004, HANSEN M.H. HURWITZ W.N. (946) The Problem of Nonresponse in Sample Surveys, Journal of the American Statistical Society. 4, OKAFOR F. C. (994) On Double Sampling for Stratification with Subsampling the Nonresponents, The Aligarh Journal of Statistics vol 4, 3 3. RAO P.S.R.S (986) Ratio estimation with subsampling the nonresponents. Survey Methoology, SÄRNDAL C.E. SWENSSON B. WRETMAN J.H. (99) Moel Assiste Survey Sampling Springer-Verlag, New York.

85 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp ADAPTATION OF EURAREA EXPERIENCE IN BUSINESS STATISTICS IN POLAND Jan Paraysz an Tomasz Klimanek ABSTRACT The paper eals with a tentative application of the EURAREA approach to small business statistics in Polan. The stuy was aime at accounting for an applying tax ata for a more effective use of a sample survey of small businesses with up to 9 employees. To achieve this aim, several more specific objectives an tasks were set out: ientifying available sources of information about economic activities of small businesses (SP3 survey, Database of Statistical Units an tax register) an assessing the possibility of integrating them; etermining the applicability of inirect estimation methos in compliance with the stanars evelope within the EURAREA project; analysing the precision of inirect estimates in comparison with traitional estimation techniques. After a thorough analysis, it seems impossible to apply automatically the EURAREA approach to small business statistics ue to marke ifferences between statistical unit istributions. The application of small omain estimation to small business statistics also nees more tax information over a longer perio. Key wors: small omain estimation, tax register, small business statistics. Introuction Polish economic statistics face a serious challenge in the nineties of the last century. It was because of a rapi change in the structure of enterprises, the wiening of the scope of the shaow economy an the increasing eman for information about the small territorial area an branch cross-section. Then, quite new challenges emerge: the access to new alternative sources of ata, Poznan University of Economics: [email protected]; [email protected]

86 84 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience improvement as far as ata processing is concerne, an new estimation techniques. The information issue by Central Statistical Office (CSO) so far was available only for NUTS level (voivoships of Polan 6 territorial units) an only for sections of Polish Classification of Economic Activities (PKD sections 4 categories). The majority of tables gave estimates for PKD sections an only 4 tables were evote to voivoship ivision. The first step for obtaining more etaile information was to combine the small omains together an to get a cross-section: PKD section an voivoship. However, proucing irect estimates for such small omains results in large MSEs. So our goal was to apply small omain methoology worke out uring the EURAREA consortium to evaluate the quality an performance of the estimators. Data sources about economic activity of small businesses Keeping in min the topic of the presentation, it is important to give the escription of some Polish sources of information which are of special importance for enterprise statistics. The main atabases are as follows: Business Register (REGON), Tax Register, Database of Statistical Units (BJS), Survey on small enterprises (SP3). Accoring to Central Statistical Office (CSO), National Official Business Register, REGON is a continuously upate set of information on subjects of national economy maintaine as an IT system in the form of a central atabase an local atabases. CSO unerlines the positive attributes of REGON 3 However, there were many reservations raise as far as REGON register as atabase for enterprises is concerne. The most severe one eals with upating the information on enterprises. It particularly refers to cases of changes in the profile of company operation, termination or suspension of activity, mergers or ivisions of companies. That criticism raise especially by statistical staff yiele the onset of preliminary works on Enterprises Register. The works are being conucte in CSO an they aim at creating the register which is planne to be upate systematically an enriche with the ata especially from tax register. This kin of register is calle, accoring to the nomenclature use in CSO, the Economic Activity of Microenterprises in 00 (in Polish), GUS, Warszawa, 003, p Accoring to CSO, the REGON register enables to reach the ientification cohesion an uniformity of escriptions use in official registers an information systems of public aministration. It provies general characteristics of businesses operating in national economy an is use as a frame for business surveys.

87 STATISTICS IN TRANSITION, March Database of Statistical Units (BJS). The BJS is a better frame than REGON for business surveys. In Polan there are several types of surveys on enterprises. These surveys take into account the size of an enterprise the number of employees. The biggest companies employing 50 an more have to provie a full monthly report. The meium-size ones, where the number of employees is between 0 an 49 take part in a 0% survey. The smallest ones, in which the number of employees is not higher than 9 take part in a survey as well. But the sample rate is 5% or,000 enterprises in that case. The last type of survey is calle SP3, which is of special interest to us. The CSO commissione the University of Economics to carry out an experimental attempt to combine SP3 survey with tax register. The newly create ata set became for the first time in Polan the subject of application of small area estimation techniques for enterprise statistics. The iscussion on the use of small area techniques was parallel to the process of creating the new system of enterprise statistics an has been conucte in the CSO an also in several Polish scientific centers 3 since 99. The stuies on applying the new inirect estimation methos were consierably accelerate after establishing the EURAREA consortium. In 003 the iea of making use of methoological finings of the EURAREA project was put forwar. This task was assigne to the research team from the Center of Regional Statistics in Poznan. Base on the experiences of the EURAREA project, the Central Statistical Office (CSO) in Polan mae an attempt to use small area statistics methos to improve estimation precision with respect to basic information about economic activities of small businesses. The stuy was aime at accounting for an applying tax ata for a more effective use of sample survey of small businesses with up to 9 employees. To achieve this aim, several more specific objectives an tasks were set: a) ientifying available sources of information about economic activities of small businesses (SP3 survey, Database of Statistical Units an tax register) an assessing the possibility of integrating them; The system of business statistics was iscusse by I. Zagoźzinska (996). More than 0 employees in trae see A. Witkowska, M. Witkowski (997) an A. Witkowska (999). 3 The results of the Polish efforts were iscusse in several papers by J. Koros (000), E. Golata (004) an J. Paraysz (998).

88 844 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience b) etermining the applicability of inirect estimation methos in compliance with the stanars evelope within the EURAREA project ; c) analysing the precision of inirect estimates in comparison with traitional estimation techniques; Accoring to CSO estimates, there were about.6m small businesses operating in Polan in 00. Small businesses are categorise as micro-companies where the number of employees oes not excee 9 people. Over 98% of small businesses are self-employe iniviuals an only 6,000 of them are legal entities. The major part of the population surveye were traing businesses (37%), followe by transportation, construction an inustry-relate units, each accounting for 0% of the total. Small businesses employ about 3.m people, 40% of whom are employe in trae. Small businesses operate mainly on local markets but the sample size is inaequate to make any generalisation in local cross-section. In the publishe source containing the survey information, only the last 3 tables present ata in regional cross-section i.e. across voivoships, without any further breakown. They inclue ata about the number of units, the number of staff, average gross wage, revenue an costs. The tables lack information in regional cross-section in terms of the type of business activity as classifie in the PKD. Using traitional estimation techniques, isregaring auxiliary variables not available for the sample, it was impossible to construct tables incluing ata about voivoships along with PKD categories. The first opportunity to make use of small area estimation appeare once the Ministry of Finance ha grante CSO access to tax registers. The stuy objective, specifie as accounting for an applying tax ata for a more effective use of a representative survey of small businesses with up to 9 employees, was unerstoo in a twofol manner. First of all, it was a verification of the hypothesis concerning the possibility of improving estimation precision in stuies available to ate. Seconly, it was intene as a possible extension of estimation scope by joint istribution by voivoship an economic activity (PKD ivision). The estimation was conucte for four basic variables use to characterise business activity: the number of staff employe uner a contract, average monthly wage, average revenue per unit, average costs per unit. Estimations were mae for the following cross-sections: voivoships i.e. 6 omains ientifie as I_W, The EURAREA "Stanar" Estimators an performance criteria Office for National Statistics, UK,

89 STATISTICS IN TRANSITION, March PKD section of economic activity i.e. omains marke as I_S voivoships an PKD categories combine, i.e. 76 omains marke as I3. It shoul be note that using aministration ata sources for purposes of public statistics in Polan is still in its preliminary stages. It is only in the recent years, as a result of lobbying on the part of the Statistical Council that it has become possible to use auxiliary ata to support survey results. Therefore the stuy presente below is experimental in its scope. It is the first Polish attempt to apply inirect estimation methoology to economic ata incorporating aministration registers. The analysis refers to ata for the year 00 from the following atabases mae available by CSO: SP3 survey, the Database of Statistical Units BJS, an tax registers. The basic source of ata about economic activity of small businesses is provie by the annual SP3 survey conucte by CSO. A sample of 4% of the total number of units in the population is rawn base on a stratifie sampling esign. The sampling algorithm was evelope by CSO Survey Design an Organisation Department for survey sampling (cf. Zagoźzinska (003)). The sampling frame was constructe on the basis of the Database of Statistical Units as available at the en of October 00. The strata were etermine accoring to 4 criteria: unit location (at the voivoship level); PKD section; unit legal status: natural or legal persons; unit size categorise into brackets: up to 5 employees, 6 9 employees. Using ata from the SP3 survey for 000, variance an coefficients of variation for the variable revenue in each cross-section were estimate. The number of units in the sample was etermine in such a way as to ensure an approximately stable estimation precision in each voivoship an PKD category for the variable revenue. Sampling for each stratum was conucte separately. When the number of units was low, the sample inclue all the units specifie in These are regions in Polan (NUTS). Their coes accoring to CSO notation are as follows: 0 Dolnośląskie, 04 Kujawsko-pomorskie, 06 Lubelskie, 08 Lubuskie, 0 Łózkie, Małopolskie, 4 Mazowieckie, 6 Opolskie, 8 Pokarpackie, 0 Polaskie, Pomorskie, 4 Śląskie, 6 Świętokrzyskie, 8 Warminsko-mazurskie, 30 Wielkopolskie, 3 Zachoniopomorskie. Coes of statistical classification of economic activities: D Manufacturing, F Construction, G Wholesale an retail trae; repair of motor vehicles, motorcycles an personal an househol goos, H Hotels an restaurants, I Transport, storage an communication, J Financial intermeiation, K Real estate, renting an business activities, L Public aministration an efense; compulsory social security, M Eucation, N Health an social work, O Other community, social an personal service activities, P Activities of househols, Q Extra-territorial organizations an boies, Consiering the small number of units, categories A, B, C an E were exclue from the analysis an groupe together as Others

90 846 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience the frame. More etaile iscussion on the sampling esign is presente by W. Niemiro et al.(00). The main problem in the stuy of small business units is the question of completeness. In the course of research, it turne out that a consierable number of units rawn for the sample were no longer economically active. In 00 the number of units sample was 4,600. Nearly 9% of them ha suspene their activities, 9.4% ha been close own, 3% ha not starte operating at all. There were also units that coul not be contacte since aress information was no longer vali - these constitute 0.8% of the total. Out of the remaining (55.900) units rawn for the sample, 83% returne a complete questionnaire form while 7% refuse to participate. All in all, the final sample in the SP3 survey containe 44,807 units. The survey results have been generalise for the population across PKD ivision or voivoships. Generalise results were weighte on the basis of the correcte sample, i.e. excluing all units that ha suspene their activities, close own, not starte their operation or were otherwise not inclue in the sample. The SP3 survey inclues basic information about economic activities of small businesses. The questionnaire form was ivie into 6 sections concerning the following areas: a) employees an wages b) the value of the fixe assets an capital investments c) VAT an income tax ) revenue an costs e) reserve value f) other special information The secon source of information about small businesses is the Database of Statistical Units (BJS). It was use as a frame for CSO-conucte surveys on the population of national economy units. BJS contains ata from the official Register of Units in the National Economy calle REGON an auxiliary information from other sources. BJS is annually upate on the basis of obligatory statistical reporting conucte by large business units an surveys of meium-size an small businesses. In 00 the Database of Statistical Units containe recors. It inclues basic company ientification ata an information about eclare number of staff. The thir an last of the available sources is the so-calle statistical PIT (personal income tax) atabase. It is a tax statement prepare for statistical purposes. In aition to ientification ata that can be matche with those foun in BJS an SP3 survey, statistical PIT contains four other variables: revenue, The creation of such a atabase was mae possible uner clause 0 of the Public Statistics Act issue on 9 September 994.

91 STATISTICS IN TRANSITION, March costs, gain an loss. After matching the tax register atabase with the BJS atabase, there were 899,0 recors in all. The stuy was base on the assumption that gaining access to tax register information shoul improve small business statistics consierably. It was assume that this particular source shoul be very reliable. In aition, its presume completeness guarantee access to information until then unavailable that shoul provie cross-sectional ata by PKD category at a regional level, possibly even at a local level, i.e. powiats (NUTS4). It turne out, however, that the atabase of tax recors matche with those from BJS an statistical PIT was seriously incomplete. It is past our competence to account for this situation. It was assume that BJS, create as a frame for sampling was a complete set comparable to the general population. A similar expectation, base on the present tax system, concerne the tax register. In fact, out of the nearly 900,000 recors it was impossible to match 4% of recors from the SP3 survey. This means that the tax register i not contain over 40% of units sample. The ultimate set of combine an matching recors from all the three sources ha been reuce from 44,807 to 6,64 units. It seems that one reason for the lack of completeness of the tax register coul be a large percentage of units which were self-employe iniviuals; for 5% of them business activity was an aitional place of work. Accoring to the Polish tax system, the obligation to submit PIT-5 statements oes not apply to units whose income has not exceee a certain statutory amount exempt from tax. In this case, such iniviuals submit only an annual statement that may not have been mae available for statistical purposes. To avoi further reuction of the sample, it was consiere necessary to incorporate auxiliary variables at the omain level rather than use moels with auxiliary variables at the unit level. The most numerous sections inclue G 5,435 units (34.4%) an D 8,674 units (9.4%). The least numerous category is B 67 units (0.4%). In the regional cross-section the most numerously represente voivoships inclue voivoship units (5.5%) an the least number of units is to be foun in voivoship (3.%) an voivoship (3.%). In the joint cross-section i.e. by voivoship an category of economic activity while there is no omain with a zero number of units, the actual number is often only slightly above it. For many of omains the sample representation in the omain i not excee 50 units. The number of units in the sample across omains obviously affects the precision of estimates obtaine by means of traitional methos. For example, estimation precision measure by the extreme values of relative estimation error REE for the whole country an across voivoships or PKD ivisions for average revenue per company is the following : Polan altogether: 4.7% The values are inclue in the methoological notes of the GUS publication entitle: Economic Activity of Microenterprises in 00 (in Polish), GUS, Warszawa 00, p. 0.

92 848 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience by PKD section: minimum section G 6.9%, maximum section A % by voivoship: minimum voivoship 4 9.9%, maximum voivoship % A small sample size in the omain cross-section can result in various ranom fluctuations with respect to values observe in the sample (in terms of intensity an irection). The ranom factor can significantly affect parameters of regression moels constructe in the stuy. For this reason we consiere various solutions to the problem of the low number of units in some omains across categories an voivoships combine. The following approaches were teste. constructing a regression moel to account for all omains regarless of the number of units each contains; grouping small omains (containing few units) to form just one group; isregaring these omains in the process of eveloping regression moels (Bracha, Lenicki, Wieczorkowski, 004, p.36) Characteristic of the target variables As was mentione earlier, one of the major problems involve in estimating information about economic activity across omains is the small sample size an incompleteness of tax registers renering integration of ata sources ifficult. Another serious estimation problem is the lack of homogeneity as concerns istribution of basic target variables such as the number of pai employment, amounts of wages, revenue or costs. Unit istributions by the variables in question are consierably skewe to the right, highly varie an of high kurtosis. On the basis of BJS ata it is evient that nearly 70% of units employ only one person. A similar value of this variable can be foun in the SP3 survey. In over 69% of units, the staff is limite to the owner or a helping family member. A quarter of business units surveye are those that offer only aitional work opportunities. One-person businesses which are the main place of employment constitute only 4% of the sample. Only 3% of units create jobs for at least 5 people. Units in the sample have a similar istribution when it comes to revenue per unit. There is a lot of variation in this respect epening on the PKD category. The coefficient of variation ranges from 340% to 5%, the thir moment of the istribution α 3 = 447 whereas the coefficient of kurtosis amounts to α 4 = 34. The problem of the lack of population homogeneity in the business statistics makes important ifficulties for a moel-base inirect estimation. Business surveys often pose a variety of ata problems that can be very ifficult to resolve simultaneously. For example, the stuy variable(s) may be highly skewe, there may be a large proportion of zero responses, some negative values an there may be several auxiliary variables that can be use to improve

93 STATISTICS IN TRANSITION, March estimation but these may inclue some extreme values. (Helin et al., 00, p.7) Consiering the strength of correlation between the estimate an auxiliary variables an having limite choice of auxiliary variables, it was necessary to select them taking into account the egree of correlation between ata from the SP3 survey, tax registers an BJS. A close relation between variables from the ifferent sources, in particular across PKD categories an quite significant correlation across voivoships an categories combine was observe. However, the relations between the variables across voivoships were relatively low an insignificant at α = 0.05 In aition to ata from the SP3 survey use as auxiliary variables, other sources were incorporate as well: BJS an tax registers. To estimate the amount of average revenue an costs per unit, the same inepenent variables were use interchangeably. For example, when estimating revenue at the unit level, the values of the inepenent variable came from the cost variable from SP3 survey; at the omain level it was the average cost per unit from the tax register. In orer to estimate the number of staff an the average monthly wage, the following auxiliary variables were use: at the unit level, they inclue the number of staff from SP3 survey an the income; at the omain level, it was information about the average number of staff from BJS an average amount of income from tax registers. EURAREA approach In orer to carry out the simulations for the analysis phases of this project, we neee the relevant ata (for target variables an covariates) on the population. In the course of constructing realistic pseuo-population, we rew on moel-base techniques similar to those that we use for the estimation itself. It was the key iea in EURAREA approach. We constructe pseuo-population base on joine atabases obtaine from the Ministry of Finance (tax register) an CSO (BJS). To o so, we ha to apply an imputation of the ata. We teste two possibilities: linear an logarithmic regression. However, simulation stuies turne out to be unsatisfactory. MSEs of seven stanar estimators were large, quite often we obtaine negative or extremely large estimates. In view of the stratifie sampling use in the stuy, it was necessary to moify the EURAREA programme to account for sample weights. The change involve giving up the simulation-base approach since no pseuo-population was create an the estimation was calculate for one sample only, which was the result of the SP3 survey of 00. Uner these circumstances, it was not possible to calculate empirical measures of estimation precision. Direct estimator variance was estimate for omains treate as strata an for the synthetic estimator of regression an the EBLUP estimator a simplifie approach was applie.

94 850 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience Conclusions The Center of Regional Statistics was the first institution in Polan to attempt inirect estimation in small business statistics in Polan. The preliminary results are encouraging but require more aequate assessment. It woul be risky at this stage to make them available as a final outcome to a wier group of users. The experimental character of the stuy was also ue to its first-time application of ata from both the SP-3 survey an the aministrative (tax) register but only for the year 00. The tax register was use as a source of covariate ata. Among the 7 estimators uner consieration, the best results were obtaine using the synthetic an GREG estimators. The other estimators yiele results that were clearly incompatible with what we knew about economic reality. Our attempt at applying the EURAREA approach involving a pseuopopulation an estimator valiation by means of the Monte Carlo metho cannot be regare as successful. Further research to be carrie out in the future, rawing on experiences collecte in Western countries with better statistical infrastructure an relying on methoologies more suite to hanling economic ata shoul prouce more satisfying results.. It is impossible to apply automatically the experiences gaine uring the EURAREA project in micro enterprise statistics. The asymmetry of istributions for basic variables escribing enterprises (revenues, costs, wages an number of employees) mae it impossible to use the approach base on pseuopopulation. This is why empirical estimation of variance (mean square error) coul not be applie. One can expect an improvement in estimation precision for small omains when analysts gain access to tax registers for consecutive years to enrich the pool of auxiliary variables. With respect to iniviual micro-enterprises, a similar positive effect can be expecte after analysing ifferent types of entities in the tax register an accounting for null values an self-employe. The problems we face an experiences gaine shoul not result in terminating the work on fining more aequate methos for enterprise statistics than the ones applie in the EURAREA project. It shoul be so especially because there are reasons for builing econometric moels base on many more covariates than we ha. These covariates coul be enterprise characteristics for several perios of time. see Heli et al. (00), Estevao, Hiiroglou, Särnal (995), Karlberg (000), Särnal, Lunström (005)

95 STATISTICS IN TRANSITION, March REFERENCES BRACHA C., LEDNICKI B., WIECZORKOWSKI R., (004); Using complex estimation methos for isagregating ata from economic activity survey in 003 (in Polish), GUS, Warsaw. ESTEVAO V., HIDIROGLOU M.A., SÄRNDAL C.-E. (995); Methoological Principles for a Generalize Estimation System at Statistics Canaa. Journal of Official Statistics,, GHOSH M., RAO J.N.K., (994); Small Area Estimation: An Appraisal, Statistical Science,Vol.9, No.. GOLATA E. (004); Inirect estimation of unemployment on local labour market. The Poznan University of Economics Publishing House. Habilitation Works, nr, Poznan (in Polish). HEDLIN D., FALVEY H., CHAMBERS R., KOKIC P. (00); Does the Moel Matter for GREG Estimation? A Business Survey Example, Journal of Official Statistics, vol. 7, No. 4, pp KARLBERG F. (00); Survey Estimation for Highly Skewe Populations in the Presence of Zeroes. Journal of Official Statistics, 6, 9 4. KORDOS J., PARADYSZ J. (000); Some experiments in small area estimation in Polan. Statistics in Transition. Vol. 4, Nr 4, LONGFORD N., (00); Initial Theory Report: Small-Area Estimation with Complex Sampling Design, EURAREA, IST MARKER D.A., (00); Proucing small area estimates from national surveys: metho for minimizing use of inirect estimates, Survey Methoology, December 00, Vol. 7, No., pp NIEMIRO W., POPINSKI W., WESOŁOWSKI W., WIECZORKOWSKI R., (00); Optimal allocation of sample in terms of uncomplete conucting of survey. (in Polish), Wiaomości Statystyczne, No. 9, pp. -9. PARADYSZ J. (998); Small area statistics in Polan. First experiances an application possibilities. Statistics in Transition. Vol. 3, Nr 5, pp RAO J.N.K., (003); Small Area Estimation, Wiley Interscience, John Wiley an Sons, INC., Hoboken, New Jersey.

96 85 J.Paraysz, T.Klimanek: Aaptation of EURAREA Experience SÄRNDAL C.-E., (993); Design-Base Approaches in Estimation for Domains, in: Small Area Statistics an Survey Designs, es. G.Kalton, J. Koros, R. Platek, vol. I: Invite Papers, Central Statistical Office, Warsaw. SÄRNDAL C.-E., LUNDSTRÖM S. (005); Estimation in Surveys with Nonresponse, John Wiley an Sons Lt. The Atrium, Southern Gate, Chichester, West Sussex. SÄRNDAL C.-E., SWENSSON B., WRETMAN J., (99); Moel Assiste Survey Sampling, Springer Verlag, New York, Berlin, Heielberg, Lonon, Paris, Tokyo, Hong Kong, Barcelona, Buapest. STANDARD Estimators for Small Areas: SAS Programs an Documentation, EURAREA Reports an SAS Program. WITKOWSKA A., WITKOWSKI M. (997); The usefulness of existing statistical ata sources for surveys on economic activity of enterprises across the region. (in Polish), in: J. Paraysz (e.) Regional Statistics. Survey an atabase integration, Poznan, pp WITKOWSKA A., (999); Economic activity of small an meium businesses across the region in the light of small area statistics. Small Area Estimation. Conference Proceeings, Riga, Latvia, pp ZAGOŹDZINSKA I. (996); Selecte issues of business statistics in Polan. Statistics in Transition, vol., No. 6, pp ZAGOŹDZINSKA I. (e.) (00); Economic Activity of Microenterprises in 00, (in Polish), GUS, Warsaw. ZAGOŹDZINSKA I. (e.) (003); Economic Activity of Microenterprises in 00, (in Polish), GUS, Warsaw.

97 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp LOGISTIC REGRESSION MODELS IN SMALL AREA RESEARCH Krystyna Pruska ABSTRACT In this paper we consier some simulation experiments for populations which are analyse with respect to zero-one variable an some auxiliary variables. These populations are ivie into many small areas. The aim of conucte research is the comparison of results of estimation of proportion for small areas on the basis of logistic regression moel in case of some kins of sampling esigns. The estimation errors are stuie by simulation. The sampling esigns that are use are iniviual sampling with replacement, stratifie sampling with replacement, stratifie sampling without replacement, poststratification sampling with replacement, poststratification sampling without replacement. The sampling esigns are apply for the whole population or subpopulation. The sample for small area is the set of population sample elements which belong to the small area (the first group of experiments) or the sample rawn irectly from the small area (the secon group of experiments). The conucte analysis shows that in carrie out simulation experiments the estimation errors are similar for ifferent sampling esigns in case of presente estimation metho an similar size of small area sample. Key wors: small area, sampling esign, logistic regression moel, simulation metho.. Introuction The binary ata often appear in statistical research. Logistic analysis allows to estimate a proportion on the basis of such ata for consiere small areas (a small area is a subpopulation of given population). In this paper an application of logistic regression moels in small area estimation is consiere. The simulation methos are applie in the analysis. Chair of Statistical Methos, University of Loz, Polan, [email protected]

98 854 K.Pruska: Logistic Regression Moels The aim of this paper is to compare the results of estimation of proportion for small areas which are obtaine on the basis of logistic regression moel for small areas in case of ifferent sampling esigns. One form of estimator of proportion is taken. We can analyse ifferences between estimates which are results of application of ifferent sampling esigns.. Logistic moel for small areas We consier a population which is ivie into G strata an M small areas. We are intereste in the zero-one variable in each small area. The proportion of one s in i-th (i=,...,m) small area is enote by θ i. We consier some auxiliary variables, too. Let vector x i enote their values for i-th small area. We can construct the following logistic regression moel: L - ( ' θ i ) = x α + ε i i for i =,...,M (.) where L - ( θ i ) = ln i i θ θ an θ i is an estimator of parameter θ i, α is the moel parameter, ε i is a ranom error, E(ε i )=0. (.) If we know value of θ i for i =,...,m (for m small areas which are ranom sample rawn from set of all small areas for given population) an x i for i =,...,M (it means for all small areas of given population) then we can estimate the parameter vector of moel (.) an unknown proportions θ i for small areas which are rawn an unrawn to sample. We may compare the errors of estimates of parameters θ i (i =,...,M) for small area samples obtaine on the basis of ifferent sampling esigns. 3. Simulation analysis of estimation errors of proportion for small areas In orer to compare errors of estimation of proportion for small areas on the basis of logistic regression moels a simulation analysis was conucte. Three populations A, B, C are create. Each population is a set of elements an is evie into 0 strata an 00 small areas. These populations are sets of points (y igk, x igk, x igk ) where i =,...,00, g =,...,0, k=,...,00, an where i enotes the number of small area, g the number of stratum, k the number of element in

99 STATISTICS IN TRANSITION, March i-th small area an g-th stratum, y igk is value 0 or an x igk, x igk are values of transformation of ranom numbers generate from normal istribution. The values y igk, x igk, x igk are realizations of ranom variables Y ig, X ig, X ig respectively, which are etermine separately for each small areas an each stratum as follows: Y ig =, where Z ig < c ig, (3.) Y ig = 0, where Z ig c ig (3.) an X ig = U + ξ ig, (3.3) X ig = U + ξ ig, (3.4) Z ig = X ig + 3X ig + ε i, (3.5) where U ~N(0;), ξ ~N(0;(i+g)/500), U ~N(5;), ξ ig ~N(0;(i+g)/000), ig ε i ~N(i/b;/0), b=30 for populations A, C an b=00 for population B; the ranom variables U, ξ, U, ξ ig, ε i are inepenent; the value c ig is the 0-th ig centile of istribution of ranom variable W ig which is the form: W ig = (Z ig 35 i/30) / [6 + 4((i+g)/500) ((i+g)/000) +0.0] for population A, (3.6) W ig = (Z ig 35) / [6 + 4((i+g)/500) ((i+g)/00) +0.0] for population B an C. (3.7) The populations are constructe in such way that the istributions of three ranom variables Y ig, X ig, X ig are etermine for each stratum in each small area separately. The variables X ig, X ig iffer with respect to expectations an variances for given i an g. The variables X ig for fixe g an i =,...,00 have ifferent variances. We observe the same for X ig. If i+g = k for k =,...,0 then variances of variables X ig, X gi are the same for suitable i, g (analogously for X ig ). The expectations of variables Z ig are the same for g =,...,0 an fixe i, these expectations are ifferent for ifferent values of i. The istributions of variables Y ig are ifferent for ifferent variants of (i,g). The g-th stratum in the whole population is sum of g-th strata from all small areas. The number of stratum an number of small area etermine the istribution of ranom variables Y ig, X ig, X ig for given stratum an given small area. The istributions of variables which we consier for g-th population stratum are the mixtures of istributions of variables, accoringly, Y ig, X ig, X ig for i =,...,00. The istributions of variables which we consier for i-th small area (we enote them Y i, X i, X i ) are the mixtures of istributions of variables, accoringly, Y ig, X ig, X ig for g =,...,0. Some parameters of small areas for populations A, B, C are presente in Table. We can see that these parameters are the least ifferent in the population A an the most ifferent in the population C.

100 856 K.Pruska: Logistic Regression Moels Table. Some parameters of small areas a) for populations A, B, C Population min i 00 θ i max i 00 A B C a) Parameter θ i is the proportion for i-th small area. Source: Own calculations. We assume that a logistic regression moel is suitable for escription of epenence between estimates of small area means an auxiliary variables. In this stuy the estimate of small area mean is the value of proper proportion (we consier ifferent variants with respect to sampling esign). The values of auxiliary variables are etermine on the basis of sample. The parameters of logistic regression moel are estimate on the basis of information about some small areas which are rawn to sample. Fifteen small areas were rawn (sampling with replacement) from each population A, B, C. We enote the small areas by S T,...,S 5 T where T = A, B, C. Information about rawn small areas S T,...,S 5 T can be obtain by ifferent way. We consier two approaches. In the first approach we raw a sample from the whole population an next we choose the sample elements which belong to small area S l T for l =,...,5 an for given T. In this way we have small area samples for rawn small areas S T,...,S 5 T. In the stuy four variants of sampling esigns for obtaining population sample were applie: stratifie sampling with replacement an without replacement SSWR an SSWOR (it means we raw units from each stratum separately an all rawn units create the sample), an poststratifie sampling with replacement an without replacement PSWR an PSWOR (it means we raw units from the whole population an next we etermine the stratum to which belong each unit, the strata are the same which were etermine earlier). We o not know the small area sample size before sampling. We know only the population sample size. Two sizes of population sample are taken: 4000 an Next there are create samples for small areas. In this case small area sample is the subset of population sample. The small area sample size is not large in case of consiere populations an population sample which has 4000 elements. In the secon approach samples were rawn irectly from each small area S T,...,S 5 T for given T. In this case also four variants of sampling esigns for obtaining the sample were applie: stratifie sampling with replacement an without replacement (SSWR an SSWOR), an poststratifie sampling with replacement an without replacement (PSWR an PSWOR). Two sizes of small θ i

101 STATISTICS IN TRANSITION, March area sample are taken: 40 an 00 (if we estimate a proportion then the 40-element sample is not large). The samples were rawn 000 times for each sampling esigns an each population. The estimates of proportion of one s for i-th small area, which is enote by θ i, were etermine on the basis of these samples an on the basis of logistic regression moel. At first, the following values were calculate for each population sample an for each sample rawn from small areas S T,...,S T 5, an for each sampling esign: G nigj N ig θ ij = y ij = yigjl, (3.8) NM i g= l= nigj x x ij ij = = NM NM i i G nigj Nig x igjl = l= nigj g G nigj Nig xigjl = l= nigj g, (3.9) where i number of rawn small area (i =,...,5), g number of stratum (g =,...,0), j number of repetition (j =,...,000), NM i number of elements in i-th small area, N ig number of elements in i-th small area an g-th stratum, n igj number of sample elements which belong to i-th small area an g-th stratum for j-th repetition, y igjl value of ranom variable Y ig for l-th sample element from i-th small area an g-th stratum for j-th repetition, x igjl value of ranom variable X ig for l-th sample element from i-th small area an g-th stratum for j-th repetition, x igjl value of ranom variable X ig for l-th sample element from i-th small area an g-th stratum for j-th repetition. Next, the following moel was consiere: θ ij ) = α 0 + α xij + α x ij + (3.0) L ( ε ij (3.) where ε ij is ranom error an L - ( θ ij θ ij ) = ln θ ij (3.)

102 858 K.Pruska: Logistic Regression Moels for each sampling esign an for each j-th repetition, separately. The parameters of moel (3.) were estimate by GLS metho on the basis of information about 5 rawn small areas for each repetition an for each population, separately. The estimates of these parameters are enote by a 0j, a j, a j (j=,...,000) respectively. Then the following values were calculate: an lij a0 j + a j X i + a j X i = (3.3) L exp( lij ) θ ij = (3.4) + exp( l ) for i =, an j =,...,000 where ij X i an X i are means of variables X i an X i for i-th small area. Next, values of the following measures of estimation errors were calculate: DS 000 L = ( θ ij θ i 000 car( D) j= i D ) (3.5) MIN MAX MIN L = min ( θ ij θ i ) j 000 car( D) i D L = max ( θ ij θ i ) j 000 car( D) i D 000 L = θ ( ij θ i ) 000 car( UD) j= i UD UDS L = min ( θ ij θ i j 000 car( UD) i UD ) (3.6) (3.7) (3.8) (3.9) MAX L = max ( θ ij θ i j 000 car( UD) i UD ) (3.0) where D is the set of inices of fifteen small areas rawn to the sample an UD is the set of inices of unrawn small areas. In the estimation proceure the inclusion probabilities are not taken into consieration. One form of estimator is taken. It enables to analyse the influence of sampling esign on the estimation errors.

103 STATISTICS IN TRANSITION, March The results of calculations are presente in Tables -4. Table. Values of estimation errors for proportion in case of simulation experiments for population A Measure Sampling from whole population Sampling from each rawn small area PSWR PSWOR SSWR SSWOR PSWR PSWOR SSWR SSWOR Population sample size: 4000 Small area sample size: 40 DS MIN MAX UDS MIN MAX Population sample size: 0000 Small area sample size: 00 DS MIN MAX UDS MIN MAX Source: Own calculations. On the basis of obtaine results we can notice for stuie populations that the measure DS has the same or very similar values for given population, given or similar sample size an ifferent sampling esigns. We observe the same for measures MIN, UDS an MIN. The values of measures MAX an MAX are more ifferentiate. In general, we observe smaller estimation errors for the estimator of θ i for sampling without replacement an stratifie sampling than for other cases of sampling but ifferences are small, which means these values are similar. We observe smaller estimation errors for the estimator of θ i for larger sample size, what is expecte result. Generally, we can observe on the basis of conucte experiments that the estimation errors of proportion for small area in case of applie estimation metho are similar for ifferent using sampling esigns an similar size of small area samples (with the exception of values of measures MAX, MAX an measure DS for smaller sizes of sample). In general, ifferences between estimation errors for the proportion estimates obtaine on the basis of logistic regression moel are similar for small area samples of similar sizes in case of small area samples which are rawn irectly from small areas an in case of small area samples which are sets consisting of elements belonging to the population samples an small areas.

104 860 K.Pruska: Logistic Regression Moels Table 3. Values of estimation errors for proportion in case of simulation experiments for population B Measure Sampling from whole population Sampling from each rawn small area PSWR PSWOR SSWR SSWOR PSWR PSWOR SSWR SSWOR Population sample size: 4000 Small area sample size: 40 DS MIN MAX UDS MIN MAX Population sample size: 0000 Small area sample size: 00 DS MIN MAX UDS MIN MAX Source: Own calculations. 4. Final remarks In this stuy the estimation of small area proportion on the basis of logistic regression moels is consiere. In general, the estimation errors of small area proportion are on the similar level for similar small area sample sizes in case of small area samples rawn irectly from the small areas an in case of small area samples which are subsets of population samples create from elements belonging to small areas (the ifferent sampling esigns are consiere). Obtaine results cannot be generalize (with respect to small number of experiment variants) but they confirm thesis that the estimation errors are similar for similar small area sample sizes (on the levels consiere in the paper) an for ifferent sampling esigns. In empirical research we cannot etermine the estimation errors in this way as in the conucte simulation experiments. In general, we often have only one sample an we o not know value of estimate parameter, so we apply other methos for estimation of errors. In presente analysis the one of estimation metho of proportion for small area (but for ifferent sampling esigns) was consiere. In statistical an econometric literature we can fin ifferent logistic regression moels (see e.g. J. Wiśniewski (986), K. Jajuga (990)) an ifferent methos of estimation of proportion for small areas with an application of logistic regression moels (see e.g. J.N.K. Rao). Therefore a nee arises to continue the conucte analysis.

105 STATISTICS IN TRANSITION, March Table 4. Values of estimation errors for proportion in case of simulation experiments for population C Measure Sampling from whole population Sampling from each rawn small area PSWR PSWOR SSWR SSWOR PSWR PSWOR SSWR SSWOR Population sample size: 4000 Small area sample size: 40 DS MIN MAX UDS MIN MAX Population sample size: 0000 Small area sample size: 00 DS MIN MAX UDS MIN MAX Source: Own calculations. REFERENCES JAJUGA K. (990), Moels with Discrete Explaine Variable. In Bartosiewicz S., Estimation of Econometric Moels (in Polish), PWE, Warsaw. RAO J. N. K. (003), Small Area Estimation, John Wiley & Sons, New Jersey. WIŚNIEWSKI J. (986), Econometric Research of Qualitative Phenomena. Methoology Stuy (in Polish), Nicolaus Copernicus University, Toruń.

106 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp IMPACT OF DIFFERENT FACTORS ON RESEARCH IN SMALL AREA ESTIMATION IN POLAND Jan Koros ABSTRACT The author consiers the impact of ifferent factors on small area estimation (SAE) research in Polan. He begins by istinguishing the impact on official statisticians an on acaemic statisticians separately. Four events which ha a significant impact on SEA research in Polan are iscusse: (i) Transition from a centrally planne economy to a marke-oriente economy; (ii) International Scientific Conference on Small Area Statistics an Survey Designs hel in Warsaw in 99; (iii) International Conference on Small Area Estimation hel in Riga, Latvia, in 999, an (iv) the EURAREA project (00 004). The above mentione events have ha a significant impact on the following statistical activities in Polan: (i) extensive stuy of theory of SAE methos an application of these methos in other countries; (ii) attempts of application of SAE methos in several fiels; (ii) yearly country statistical conferences where SAE methos were presente; (iii) international conferences where Polish statisticians presente their contributions. The author escribes here the following fiels in which SAE methos were use: a) estimation of some employment an unemployment characteristics by region an poviat (county); b) estimation of some characteristics of the smallest enterprises by region an poviat; c) application of Hierarchical Bayes metho in estimation of unemployment by region an poviat; ) estimation of some agricultural characteristics by region an poviat using agricultural sample surveys an agricultural census ata. In concluing remarks some aspects of future evelopment are consiere. Key wors: Small area estimation, Official statistician, Acaemic statistician; Moel-base estimation, Data quality, Labour force survey, Agricultural sample survey. This is an extene an upate version of the paper entitle Impact of the EURERA Project on Research in Small Area Estimation in Polan presente at the International SAE005 Conference on Enhancing Small Area Estimation Techniques to Meet European Nees, University of Jyväskyla, Finlan, 7 3 August 005. Warsaw School of Economics, Warsaw, Polan; [email protected]

107 864 J.Koros: Impact of Different. Introuction Consiering the impact of ifferent factors on research in small area estimation (SAE) in Polan, one shoul begin by istinguishing between official statisticians an acaemic statisticians. These two groups of statisticians have ha ifferent experience with SAE. In Polan, official statisticians, an mainly sampling statisticians, have ha a permanent contact with the process of ata collection an with ifferent kin of users an their ata requirements. In the previous economic system, i.e. before 989, ata for economic statistics were usually available from complete reporting an ata for ifferent breakowns an omains were available. Some ata for social statistics were generally obtaine from sample surveys, such as family buget surveys, income surveys, time use surveys, health status surveys, etc. Sizes of samples were usually aequate for the country as a whole, but users require ata by regions an ifferent omains. In such cases official statisticians sometimes use various statistical methos to make assessment of require ata. Very often they use simplifie methos to assess require ata, but sometimes these methos were quite complex. Data from ifferent sources were use, an some moels were constructe to get the require information. I woul like to give several examples from my own experience later. Acaemic statisticians, mainly those teaching statistics an oing research, ha ifferent contact with ata requirements an application of statistical methos to get the require information. Sometimes they use simplifie methos to get require information for their own research (Paraysz, 998). However, the most important factor for them was the change of economic system in 989, an next the International Conference on SAE hel in Warsaw in 99, the Riga Conference on SAE in 999, an the EURERERA Project. They starte to stuy the theory of SAE methos an experiences in other countries (Bracha, 994, 996; Domański, an Pruska, 00; Golata, 003; Koros & Paraysz, 000; Paraysz, 998). Official statisticians took part in all these events, but their approach was more pragmatic.. Assessment of require ata for omains an small areas before 989 Historical review of SAE in Polan was presente at the Warsaw Conference in 99 (Kalton et al., 993; Koros, 994). Several Polish official statisticians participate in the assessment of require information from ifferent fiels. They usually use simplifie methos to get the require information or constructe some moels an sometimes they use more sophisticate methos to obtain the require ata. First of all, I woul like to mention J. Wojtyniak, who constructe such a moel in 938, using ata from ifferent sources (Wojtyniak, 938).

108 STATISTICS IN TRANSITION, March Official statisticians at GUS have ha the opportunity of consulting some more complex statistical problems with acaemic statisticians co-operating with a Mathematical Commission at GUS, which ha a significant impact on application of statistical methos in official statistics. Usually they consulte sample esigns an estimation methos for planne sample surveys. There were also some problems connecte with small area estimation, but usually these problems were solve by sample allocation an application of simplifie methos (Pawłowska, 969). However, as it was mentione above, other statistical methos were also use to get the require ata when size of sample was too small to obtain reliable estimates. For example, some estimates were require from househol buget survey by voivoship but the sample size by voivoship was too small to get reliable estimates. In some cases special moels were constructe using ata not only from HBS but also from other sources, such as the population census, structure of wages by size, an insurance statistics (Koros, 959, 963)... The GUS Mathematical Commission The GUS Mathematical Commission was establishe in 949, an was heae at the beginning by the GUS Presient, Prof. Stefan Szulc (967). The Commission inclue acaemic statisticians from various universities in Polan an the GUS staff (about 5 members). At that time it was the only commission in Polan existing in the public aministration, which ealt with the application of mathematical methos to the observation of economic an social phenomena. The Commission was twice consulte by Prof. J. Neyman (Fisz, 950; Zasępa, 958) regaring sampling methos in househol buget survey, agricultural survey, speeing up processing of population census results. Although the Commission playe mainly the avisory an opinion-making role, its particular members ealt with practical matters such as the preparation of the sampling esign, sample size, estimation methos an estimation of sampling errors (GUS, 979). A consierable number of papers presente at the GUS Commission were publishe in Polish statistical journals an in special monographs (see: GUS, 969, 97, 979). 3. Transition from a centrally planne economy to a market-oriente economy Graual transformation of Polish statistics begun in 989 i.e. with the change of the social an economic system. The official statistics was face with new tasks an challenges, the implementation of which epene to a consierable GUS acronym for the Central Statistical Office of Polan.

109 866 J.Koros: Impact of Different egree on, among others, effective co-operation with the acaemic statisticians an economists. The experience uner the previous system was useful only on a limite scope. New problems an tasks not known before to our official statistics emerge. Some of these problems are liste below: use of aministrative registers for statistical purposes, integration of ata from various sources, evelopment an implementation of registers of employers, training of experts in the use of various international stanars an classifications an aaptation of those classifications to the national conitions, extension of sample surveys at a broaer scale, especially in the economic statistics (estimation for small area, ata quality, etc.), extension of the methos compensating non-response (methos of weighting the results, imputations, moel approach, simulation, etc.). In the perio of transformation of statistics, in Polan as in other countries, there were consierable financial limitations, which certainly ha an influence on the pace of transformation of statistics an the scope of co-operation with acaemic statisticians an economists. The shortage of highly qualifie staff was a serious limitation. The previous statistical system which was base mainly on complete reporting, simple questionnaires an formal instructions, simplifie tables an limite escriptive analysis in the majority of cases i not require moern methos of ata collection, esigning of sophisticate questionnaires an aequate training of the staff conucting surveys, sophisticate methos of control of in-fiel operations an at various stages of the implementation of statistical surveys Sampling statisticians at GUS realize that with the change in economic system from a centrally planne to market economy sampling methos woul play an important role. Complete reporting in economic statistics stoppe an some sample surveys began. They realise fully on avantages an rawbacks of sampling methos. Classical approach to sample surveys was not enough. We have rea on new methos from literature. The first book I rea on SAE methos was a publication of Platek et al (987). I stuie it with great interest an uner its influence prepare an article on small area statistics in Polish an publishe in our Wiaomości Statystyczne (Koros, 99). I ha the opportunity to iscuss these problems with Director General of Eurostat, Mr. Y. Franchet. In conclusion, I suggeste organizing an international conference in Warsaw evote to small area estimation. Eurostat accepte the proposition an sponsore it financially, together with the Central Statistical Office of Polan an the Polish Statistical Association.

110 STATISTICS IN TRANSITION, March The International Scientific Conference on Small Area Statistics an Survey Designs hel in Warsaw in 99 Both Dr. Richar Platek from Statistics Canaa an Prof. Graham Kalton playe significant role in organization an preparation of the Conference. Dr. R. Platek was chairman of the Programme Committee, an Prof. G. Kalton at that time was Presient of the International Association of Survey Statisticians (IASS). The IASS, together with the GUS an the Polish Statistical Association organize the Conference. The most important experts in this fiel from ifferent countries were invite. Altogether, 4 invite papers ( from Polan) an 6 contribute papers (6 from Polan) were presente. Out of 8 presente papers from Polan 6 papers were presente by official statisticians (J. Koros, K. Latuch, L. Nowak, S. Szwałek & H. Zaremba, K. Świerch, an J. Witkowski), an by acaemic statisticians (Cz. Bracha an B. Suchecki). Other Polish statisticians serve as chairpersons of sessions or acte as invite iscussants: 3 official statisticians (A. Ochocki, A. Szarkowski an R. Zasępa) an 6 acaemic statisticians (B. Górecki, A. Kurzynowski, T. Panek, J. Pogórski, A. Szulc, W. Saowski ). It is possible to raw the conclusion that uring the 99 Warsaw Conference the official statisticians from Polan mae major contribution to this conference. Proceeings of the Warsaw Conference were publishe in 993 (Kalton et al.,993) an selecte papers in Statistics in Transition in 994 (vol., Number 6). Polish statisticians, mainly acaemic, begun to stuy not only the papers presente at the Warsaw Conference but also those from other journals an books, an began to unertake their own research. Several papers were presente uring the country conferences an some publishe in English (e.g. Paraysz, 998). Results of these activities were shown at the Riga Conference. 5. An International Conference on Small Area Estimation hel in Riga, Latvia, in 999 After the Warsaw Conference three acaemic centres starte intensive research in SAE, i.e. in Warsaw (Warsaw School of Economics), Poznań (University of Economics), an in Łóź (University of Łóź). Later other centres joine: in Katowice (University of Economics) an Gansk (University of Gansk). Results of the unertaken research were seen at the Riga 999 Conference. The Riga Conference was organize as a Satellite Conference of IASS by: IASS, Central Statistical Bureau of Latvia an University of Latvia. The Conference was sponsore by: Australian Bureau of Statistics, Central Statistical Bureau of Latvia, IASS, Ministry of Economy of Republic Latvia an University of Latvia. I serve as a chairman of the Programme Committee.

111 868 J.Koros: Impact of Different One out of 5 invite papers an 6 out of 6 contribute papers were from Polan (Riga, 999: invite paper: Koros & Paraysz; contribute papers: P. Blazczak, G. Dehnel & G. Golata, J. Kubacki; K. Pruska, A. Witkowska an J. Wywial). From these three acaemic centers, Poznań University of Economics was the most active (3 papers). It is clear that the 99 Warsaw Conference ha a significant impact on the SAE research by acaemic statisticians in Polan. From the Riga International Conference (Riga, 999) two sets of papers were publishe in Statistics in Transition (vol. 4, Number 4, 000, an vol. 4, Number 5, 000). There were a few attempts to apply SAE methos to measure the extent of unemployment, poverty an househol structure an attempts to apply SAE methos in agriculture relate surveys, also some books an articles have been publishe (Domański & Pruska, 00; Golata, 003; Koros & Paraysz, 000; Paraysz, 00; Pekasiewicz & Pruska, 00). 6. The EURAREA project (00 004) Preparation for EURAREA project starte in late 999. Polish statisticians were invite to join the project. Preliminary programme of the project was iscusse at the Lonon Conference at the beginning of May 000. As a representative of Polan, I participate in this Conference an was impresse by the programme of the Project an its methoology. To popularize the Project among Polish statisticians, I publishe an article on the purposes, technical an methoological aspects of the Project in Polish (Koros, 000). Finally, the team from Poznań University of Economics joine the Project with Prof. Jan Paraysz as a team leaer. I co-operate with the team only in the first phase of the Project. Later I ha my own programme focusing mainly on small area statistics an ata quality, supervising several octors thesis in SAE, an oing some own research. The EURAREA project, fune by the EU, was intene both for the research of technical aspects of SAE from survey ata, an to provie Eurostat an European National Statistical Institutes with broa recommenations for statistical policy on SAE (Heay an Hennel, 00; Heay an Ralphs, 004). Therefore, the purpose of the EURAREA project was to investigate the performance of stanar an innovative methos for SAE in the European context, an to provie avice to Eurostat, an to European National Statistical Institutes, on the appropriate use of SAE methos in the context of official statistics. The full range of results, incluing methoological finings, specimen programs an recommenations regaring statistical policy was publishe in the EURAREA project reference volume, an were mae available on the project website (Heay an Ralphs, 004). These, an all of the SAS programs evelope in EURAREA, are available for ownloaing from We hope to use them in our future research.

112 STATISTICS IN TRANSITION, March The Polish contribution to the EURAREA project As it was mentione above, the group from Poznan University of Economics joine the Project: Prof. Jan Paraysz, Dr. Grażyna. Dehnel, Dr. Elżbieta Golata an Dr. Tomasz. Klimanek. A number of reports were prepare by the Poznań team an some publishe in Statistics in Transition (Dehnel et al., 004; Golata, 004). Several articles prepare by the Poznań team were publishe in Polish an presente uring the country conferences evote to regional statistics. Dr. E. Golata publishe in Polish a comprehensive monograph on inirect estimation of unemployment on local labour market (Golata, 004) where SAE methos were use extensively. The report prepare for the EURAREA project by Dehnel, Golata an Klimanek an publishe also in Statistics in Transition (Dehnel et al., 004) was consiere as an important contribution to the project (Heay, Ralphs, 006). Two papers prepare by the Poznan team were presente at the SAE005 Conference (Dehnel & Golata, 006; Paraysz & Klimanek, 006). For the purpose of the EURAREA valiation program, a special atabase was set up. The Polish atabase the so-calle super-population labelle POLDATA was create on the basis of three ata sources: the 995 Microcensus, the 995 Househol Buget Survey an the Local Data Bank. POLDATA provie real information about the target variables an represents\e as closely as possible the characteristic of Polan in 995 with respect to the new aministration ivision of the country, which was introuce in January 999 (Golata, 004). 7. Attempts of application of SAE methos in several fiels After the Riga Conference (Riga, 999) several attempts were mae in Polan to apply various SAE methos in ifferent fiels. The contribution of the Poznan team was alreay mentione. Here, I woul like to mention briefly the application of SAE methos for estimating some characteristics relate to employment an unemployment, small business, an agriculture by region, subregion an poviat (county) (NUTS, NUTS 3 an NUTS4). 7.. Estimation of some employment an unemployment characteristics by region an poviat (county) Some interesting results of applying SAE methos for estimating some employment an unemployment characteristics by region, sub-region an poviat (county) for using LFS results an the 00 Population Census were obtaine by Cz. Bracha, B. Lenicki an R. Wieczorkowski (Bracha et al, 003). They use the Polish Labour Force Survey (PLFS) an the 00 Census of Population an Housing (00 CPH) for estimating some employment an unemployment characteristics in Polan in The SAE methos were use to estimate these parameters by region, sub-region an poviat (NUTS,

113 870 J.Koros: Impact of Different NUTS3 an NUTS4). The authors use irect, synthetic an composite estimators. The PLFS starte in 99 (Szarkowski an Witkowski, 994), an was reesigne accoring to Eurostat requirements in the fourth quarter of 999 (Eurostat, 998). Direct estimates were obtaine from the PLFS in , an appropriate ata from the 00 CPH were use as auxiliary information. Efficiency analysis of irect, synthetic an composite estimators was conucte. The composite estimator was a combination of the irect an synthetic estimators with equal weights. This paper is in Polish, but the iscussion of results obtaine by the authors (Bracha et al, 003) is presente in an article by Kubacki (006). In the secon paper by the same authors (Bracha et al., 004), the composite estimator uses ata from the PLFS 003 together with ata from aministrative sources that are available on Polish Public Statistics web pages. They compare similar estimates that are base on Census 00 ata which may reveal usefulness census vs. aministrative ata. Interesting results are given in Kubacki (006) paper. The author presents a synthetic review concerning methoology an results consiere in Bracha et al. (003, 004) an some own results. The author iscusses various methos of estimation together with evaluation of quality of such estimation relate in particular with the type of auxiliary ata use for borrowing strength an efficiency of initial estimates use in moels (Kubacki, 000, 004). The author presents the application of Hierarchical Bayes (HB) methos to the estimates of unemployment size for small areas using the Polish Labour Force Survey (PLFS) an auxiliary information (Kubacki, 004). The constructe moel inclues the ata obtaine from publishe results of PLFS for regions in Polan an 00 Census ata. 7.. Estimation of some characteristics in small business statistics The Central Statistical Office of Polan annually conucts some sample surveys generally esigne to provie reliable irect estimates at the level of geographic regions an major omains (subgroups) of the population of interest, generally efine by economic activity an classes of size (efine by number of employees). Often sample irect estimates of small omains (e.g. efine at subregional level for the enterprises of a given four-igit NACE economic activity) are likely to yiel unacceptably large stanar errors ue to an unuly small sample size for the area. A sample survey of the smallest enterprises is carrie out each year with sample size of about 5 percent. The research project starte by the team involve in EURAREA uner the leaership of J. Paraysz, trying to apply the SAE methos for estimates by region sub-region an poviat. Pseuo-population was create to stuy efficiency of ifferent types SAE estimators. The same types of seven estimators were use as for the EURAREA project. Personal tax ata were use as auxiliary information. Report of this research was publishe in Polish

114 STATISTICS IN TRANSITION, March (Paraysz, 003). Besies, two papers were presente at the SAE005 Conference an are inclue in this issue. The first paper (Paraysz & Klimanek, 006) eals with a tentative application of the EURAREA approach to small business statistics in Polan. The stuy was aime at accounting for an applying tax ata for a more effective use of a sample survey of small businesses with up to 9 employees. To achieve this aim, several more specific objectives an tasks were set out: ientifying available sources of information about economic activities of small businesses (SP3 survey, Database of Statistical Units an tax register) an assessing the possibility of integrating them; etermining the applicability of inirect estimation methos in compliance with the stanars evelope within the EURAREA project; analysing the precision of inirect estimates in comparison with traitional estimation techniques. After a thorough analysis, the authors prove that it is impossible to apply automatically the EURAREA approach to small business statistics ue to marke ifferences between statistical unit istributions. The application of small omain estimation to small business statistics also nees more tax information over a longer perio. The secon paper (Dehnel & Golata, 006) presents the first attempts to use aministrative ata sources an inirect estimation techniques to estimate basic economic information about small business in the joint cross-section of Polish Classification of Economic Activities PKD an voivoships. The stuy objective, specifie as accounting for an applying tax ata for a more effective use of a survey of small businesses with up to 9 employees, was unerstoo in a twofol manner. First of all, it was the verification of the hypothesis concerning the possibility of improving estimation precision in stuies available to ate. Seconly, it was intene as a possible extension of estimation scope by joint istribution by voivoship an economic activity (PKD ivision). The basic economic information, for the aim of this stuy, was limite to employment an revenues. Results obtaine in the stuy entitle the authors to raw the following conclusion. The application of inirect estimation to small business ata requires consiering the heterogeneity of its istribution. Nevertheless the results of the stuy present practical possibilities an benefits of aopting the techniques of small area estimation to small business ata in Polan. These are the first results of an attempt to apply SAE methos in small business statistics. Research in this fiel is also going on in GUS with the cooperation of the Poznan group involve in regional statistics.

115 87 J.Koros: Impact of Different 7.3. Estimation of some agricultural characteristics by region an poviat using agricultural sample surveys an agricultural census ata The first results of applying SAE methos on experimental basis in agriculture in Polan were publishe by Koros an Paraysz (000). They relate to estimation of some numbers of livestock an area uner selecte crops in 998 using irect estimates an empirical Bayes estimates by poviat. A much more comprehensive stuy was unertaken by Bartosińska (005). Using irect estimates from agricultural sample surveys in 998 an 00, an respective ata from the 996 Census of Agriculture as auxiliary information, empirical Bayes estimate (EB) an hierarchical Bayes estimate (HB) were calculate an compare by poviats (NUTS4). The main purpose of the stuy was as follows: Estimation of selecte agricultural characteristics using agricultural sample survey an last Census of Agriculture by poviat (NUTS4) Matching of selecte holings in the agricultural sample surveys (in 998 an 00) with the 996 Census of Agriculture (CA996) Use of characteristics from CA996 as auxiliary variables Application of unit-level an area-level in estimation proceure Stuying ecological effect Suggesting estimation metho for estimating selecte characteristics (total an means) Main conclusions from the research: It is possible to match holings from the sample with last census of agriculture; Auxiliary ata from last census can significantly improve estimates for small area (NUTS4) It is possible to obtain quite ifferent results from fitting the unit level regression moel compare to the area-level regression moel It is avisable to fit a unit-an-area level equation whenever possible EB estimation can be applie by NUTS4 More experiments are neee in this fiel A more comprehensive paper on results of this research is going to be publishe in our journal soon. 8. National statistical conferences Some results of the EURAREA project an finings of the iniviual statisticians in the fiel of SAE were presente at the Polish statistical conferences an seminars. I woul like to mention only the conferences organize at the country level. There are three such conferences evote to:

116 STATISTICS IN TRANSITION, March regional statistics organize by the Poznan University of Economics (by Prof. J. Paraysz an his team) sampling methos organize by Katowice University of Economics (by Prof. J. Wywiał,) multivariate statistical analysis organize by Lóź University (by Prof. Cz. Domański,) Some reports from these conferences were publishe in Statistics in Transition. 9. International conferences an seminars There were several international conferences or seminars where Polish statisticians (official an acaemic) presente some papers on their finings in SAE. Some of them are mentione below.. At the ISI 54 TH Session in Berlin, in August 003, one of the invite papers meetings was evote to Small Area Design an Estimation. I serve as an organizer an chairman of this meeting, an E. Golata, presente an invite paper entitle Attempts. At the European Conference on Quality an Methoology in Official Statistics (Q004), Germany, 004, I presente a paper on some aspects of small area statistics an ata quality (Koros, 005). 3. During the seminar with Russian statisticians hel in Warsaw, July 4 6, 005, the following issues connecte with the EURAREA Project an SAE methos were presente an iscusse (the statistician s name is given in brackets): Genesis, assumptions an implementation of EURAREA Project (J. Koros). Synthetic results an conclusions from EURAREA Project (J. Paraysz). Utilization of experiences from EURAREA Project for estimation in small omains in small business statistics (G. Dehnel). Application of moel approach of SAE for small business estimates (W. Niemiro). Some simulation methos for small business estimates (T. Piasecki). Some theoretical consieration for synthetic an composite estimators (J. Wesołowski). Some aspects of sample allocation (R. Wieczorkowski).

117 874 J.Koros: Impact of Different At the SAE005 Conference in Finlan the following papers were presente from Polan: a) Invite papers: J. Koros, Impact of the EURAREA project on research In small area estimation In Polan; J. Paraysz & T. Klimanek, Aaptation of EURAREA experience in business statistics In Polan. b) Contribute papers: G. Dehnel & E. Golata; Attempts to estimate basic information for small business in Polan; M. Gamrot, Estimation of a omain mean uner nonresponse using ouble sampling; K. Pruska, Logistic regression moels In small area investigations; T. Żąło, On Mean Square Error of EBLU Preictors base on the Formula of Royall s BLU Preictor. 0. Concluing remarks Presenting an iscussing ifferent factors an events, which have ha major impact on evelopment of SAE methos in Polan, I confine myself to the available ocumentation an facts. However, I realise that my presentation, comments, assessments an conclusions are subjective, an given from my point of view. Since I have been involve in Polish statistics for the last fifty years, I ha an opportunity to take part in these events as an official statistician an partly as an acaemic statistician. I worke also at FAO an the Worl Bank as an expert an a consultant nearly eight years. Working abroa I always ha some problems with SAE. During the last fifteen years in Polan we have been involve in SAE methos to some extent, first stuying theory an practice of other countries, next trying to experiment with some methos in practice. This perio remins me of our involvement in sampling methos in the 60s an 70s. Even uner ifferent economic system we trie to introuce sampling methos in social statistics. First we were mainly involve in sampling errors, next non-sampling errors were investigate, an quality of statistical ata. I think that we evote too much time to sampling errors, neglecting ifferent kin of non-sampling errors an ata quality assessment. It seems to me that in last fifteen years we have been involve mainly in sampling aspects of SAE, neglecting other sources of errors. Last fifteen years of involvement in SAE methos may be treate as first phase of recognising philosophy an methoology. Usefulness of the SAE methos for official statisticians is very important. At present it is too early to apply these methos in official statistics at a larger scale. Aitional research in

118 STATISTICS IN TRANSITION, March the fiel of SAE is neee. Also aitional experiments with applying SAE methos in ifferent fiels are neee. In the future small area estimation shoul not be treate as a separate branch of official statistics but become integrate part of statistics system, with common philosophy an methoological founation. Some positive evelopment is observe in Polish official statistics regaring SAE methos. At GUS, in one of it s ivisions, a group of mathematicians has been set up which is eeply involve in SAE methos. Co-operation of regional statistics in Poznań group (prof. J. Paraysz) with appropriate ivisions at GUS is becoming closer than before in the fiel of SAE methos. REFERENCES BARTOSINSKA, D. (005), Application of Small Area Estimation Methos for some Agricultural Characteristics using Agricultural Sample Surveys an Census of Agriculture Data. (in Polish), (mimeo). BRACHA, Cz. (994): Methoological Aspects of Small Area Surveys. Z prac Zakłau Baań Statystyczno-Ekonomicznych, No. 43, 45pages (in Polish). BRACHA, Cz. (996), Small Area Statistics. In: Theoretical Backgroun of Sampling Methos. PWN, Warszawa, (in Polish). BRACHA, CZ., LEDNICKI, B. an WIECZORKOWSKI, R. (003), Estimation of Data from the Polish Labour Force Surveys by poviat (counties) in , Central Statistical Office of Polan, Warsaw, 97p. (in Polish). BRACHA, CZ., LEDNICKI, B. an WIECZORKOWSKI, R. (004), Utilization of Composite Estimation Methos from the Labour Force Survey by region an poviat in 003, Research Centre of the Central Statistical Office an Polish Acaemy of Sciences, No. 99, Warsaw 004 (in Polish). BRACKSTONE, H. (999), Managing Data Quality in a Statistical Agency. Survey Methoology, vol. 5, No., pp DEHNEL, G. an GOLATA, E. (006), Attempts to Estimate Basic Information for Small Business in Polan. Statistics in Transition, Vol.7, Number 4, pp. DEHNEL, E., GOLATA, E. an KLIMANEK, T. (004),Consieration on Optimal Design for Small Area Estimation, Statistics in Transition, vol. 6, Number 5, pp DOMAŃSKI, CZ., PRUSKA, K. (00), Methos of Small Area Statistics, Wyawnictwo Uniwersytetu ózkiego, Łóź, 6 pages (in Polish). EUROSTAT (003). Definition of Quality in Statistics. Eurostat Working Group on Assessment of Quality in Statistics, Luxembourg, 3 October 003.

119 876 J.Koros: Impact of Different FISZ, M. (950), Consultation with prof. Neyman an Conclusions. "Stuia i Prace Statystyczne", nr 3 4. (in Polish). GAMBINO, J. an Dick, P. (000), Small Area Estimation Practice at Statistics Canaa, Statistics in Transition, t. 4, Nr 4, pp GHOSH, M. (00), Moel-epenent Small Area Estimation Theory an Practice. In: Lecture Notes on Estimation for Population Domains an Small Areas, (R. Lehtonen, K. Djerf, es.), pp GHOSH, M. an RAO, J.N.K. (994). Small Area Estimation: An Appraisal. Statistical Science, Vol. 9, No., pp GOLATA, E. (004), Inirect Estimation of Unemployment on Local Labour Market, Wyawnictwa Akaemii Ekonomicznej w Poznaniu. (In Polish). GOLATA, E. (004a), Problems of Estimate Unemployment for Small Domains in Polan, Statistics in Transition, vol. 6, Number 5. pp GUS (969), Application of Mathematical Methos in Statistics, Warszawa. (in Polish). GUS (97), Selecte Methoological Problems of Sampling Surveys, Warszawa. (in Polish). GUS (979), Methoology of Sampling Surveys in CSO - Works of Mathematical Commission, Warsaw (in Polish). HEADY, P., HENNELL, S. (00), Enhancing Small Area Estimation Techniques to Meet European Nees, Statistics in Transition, vol. 5, Number,00, pp HEADY, P. an RALPHS, M. (004), Some Finings of the Eurarea Project an their Implications for Statistical Policy, Statistics in Transition, vol. 6, Number 5, KALTON, G., KORDOS, J. an PLATEK, R. (993). Small Area Statistics an Survey Designs, Vol. I: Invite Papers; Vol. II: Contribute Papers an Panel Discussion. Central Statistical Office, Warsaw. KORDOS, J. (959), Assessment of Non-agricultural Population by Income groups, Wiaomosci Statystyczne, No. 3, pp. (in Polish). KORDOS, J. (963), Distribution of Non agricultural Population accoring to Income Per Capita in 960 "Biuletyn Komitetu Przestrzennego Zagospoarowania Kraju PAN", nr 8, Warszawa. (in Polish). KORDOS, J. (985), Towars an Integrate System of Househol Surveys in Polan, Bulletin of the International Statistical Institute, (invite paper), vol. 5, Amsteram, Book, (985), pp

120 STATISTICS IN TRANSITION, March KORDOS, J. (990), Research on Income Distribution by Size in Polan. In: Stuies in Contemporary Economics, C. Dagum, M. Zenga (Es), Income an Wealth Distribution, Inequalities an Poverty. Springer-Verlag Berlin Heielberg, pp KORDOS, J. (99), Small Area Statistics an Sample Surveys, Wiaomosci Statystyczne, No. 4, (in Polish). KORDOS, J. (994), Small Area Statistics in Polan (Historical Review), Statistics in Transition, vol., Number 6. pp KORDOS, J. (995), Transformation of Polish Statistics Challenges an Limitations, Wiaomosci Statystyczne, No. 6, pp. 5. (in Polish). KORDOS, J. (999), Acaemic an Official Statistics Co-operation: the Polish Experience. In: Proceeings of the Seminar on Acaemic an Official Statistics co-operation hel in Bucharest in September 998. European Communities, 999, pp KORDOS, J. (000), New Project for Small Area Estimation, Wiaomości Statystyczne, No. 8, pp. 0. (in Polish). KORDOS, J. (003), Improvement Programme of Statistics Quality, Wiaomości Statystyczne, Nr 7/8, pp (In Polish). KORDOS, J. (005), Some Aspects of Small Area Statistics an Data Quality, Statistics in Transition, vol. 7, Number, pp.0-3. KORDOS, J. an PARADYSZ, J. (000), Some Experiments in Small Area Estimation in Polan, Statistics in Transition, Vol. 4, Number 4, pp KUBACKI, J. (004), Application of the Hierarchical Bayes Estimation to the Polish Labour Force Survey, Statistics in Transition, Vol. 6, Number 5, pp KUBACKI, J. (006), Remarks on Using the Polish LFS Data for Unemployment Estimation by County, Statistics in Transition, Vol. 7, Number 4, pp. MARKER, D. (00), Proucing Small Area Estimates From National Surveys: Methos for Minimizing use of Inirect Estimators, Survey Methoology, Vol. 7, No., pp NEYMAN, J. (933), An Outline of the Theory an Practice of Representative Metho Applie in Social Research. Instytut Spraw Spolecznych, Warszawa (in Polish). NEYMAN, J. (934), On the Two Different Aspects of the Representative Metho: The Metho of Stratifie Sampling an the Metho of Purposive Selection, Journal of the Royal Statistical Society, NR 97, pp

121 878 J.Koros: Impact of Different PARADYSZ, J. (998), Small Area Statistics in Polan First Experiences an Application Possibilities, Statistics in Transition, Vol. 3, Number 5, pp PARADYSZ, J. (00), Small Area Estimation in Regional Statistics. In: A. Zeliaś (E.), Time an Space Moelling an Forecasting of Economical Phenomena. Wyawnictwo Akaemii Ekonomicznej w Krakowie, pp (in Polish). PARADYSZ, J. an KLIMANEK, T. (006), Aaptation of EURAREA Experience in Business Statistics in Polan. Statistics in Transition, Vol.7, Number 4, pp. PAWŁOWSKA, J. (969), Efficiency Measurement of Sample Designs an Estimation Methos in Livestock Surveys. In: Application of Mathematical Methos in Statistics, BWS, Vol. 7, GUS, Warszawa, pp (in Polish). PEKASIEWICZ., D., PRUSKA K., (00) Analysis of Distribution of Some Estimators in Small Area Statistics, Folia Oeconomica, 56, pp. 9. PLATEK, R., RAO, J.N.K. SÄRNDAL, C.E., an SINGH, M.P. (Es.) (987), Small Area Statistics, John Wiley & Sons, New York. PFEFFERMANN, D. (00) Small Area Estimation New Developments an Directions. International Statistical Review 70, pp RAO, J.N.K. (003), Small Area Estimation, John Wiley & Sons, New Jersey. RIGA (999), Small Area Estimation Conference Proceeings, Riga, Latvia, August 999. SARNDAL C-E., SWENSSON B., WRETMAN J. (99), Moel assiste survey sampling, Springer-Verlag, New York, Berlin, Heielberg, Lonon, Paris, Tokyo. SPJŘVOLL, E. an I. THOMSEN (988), Applications of some empirical Bayes methos to small area statistics, 46th Session of the International Statistical Institute (invite paper). STATISTICAL POLICY OFFICE (993), Inirect Estimators in Feeral Programs, Subcommittee on Small Area Estimation, Statistical Policy Working Papers. STATISTICS CANADA (998). Statistics Canaa Quality Guielines. Thir Eition. October 998. SZARKOWSKI, A., WITKOWSKI, J. (994), The Polish Labour Force Survey, Statistics in Transition, Vol., No. 4, pp SZULC, S. (967), Statistical Methos (in English an Polish), PWE, Warszawa

122 STATISTICS IN TRANSITION, March TREWIN, D. (00), The importance of a Quality Culture, Survey Methoology, vol. 8, No., pp U.K. GOVERNMENT STATISTICAL SERVICE (997), Statistical Quality Check List. Lonon: U.K. Office for National Statistics. WIŚNIEWSKI, J. (934), Income Distribution by Size, Instytut Baania Koniunktur Gospoarczych i Cen, Warszawa (in Polish). WOJTYNIAK, J. (938), Distribution of Workers Family by Income, Statystyka Pracy, no. 3 (in Polish). ZASĘPA, R (958), Problems of Sampling Surveys of GUS in the light of Consultation with prof. J. Neyman. "Wiaomosci Statystyczne", nr 6, pp. 7. (in Polish).

123 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp EXACT DISTRIBUTION OF THE NATURAL ARPR ESTIMATOR IN SMALL SAMPLES FROM INFINITE POPULATIONS Ryszar Zieliński ABSTRACT In the European Commision Eurostat ocument Doc. IPSE/65/04/EN page, the "at-risk-of-poverty rate" ARPR is efine as the fraction of persons in a given population with the equivalise isposable income smaller than p percent p = 60 of the q th population quantile ( q = the population meian). A natural estimator of ARPR is the fraction of persons in a sample with the income smaller than p percent of the q th sample quantile. In the note we present the exact istribution of the estimator as well as the exact formulas for its expectation an its variance. Numerical examples illustrate the results for some population istributions.. Introuction In the European Commision Eurostat ocument Doc. IPSE/65/04/EN page, the at-risk-of-poverty-rate ( ARPR ) is efine as follows. Let EQ _ INC i enote the equivalise isposable income of person i an let weight' i enote the weight of person i. The at-risk-of-poverty threshol (ARPT) is calculate as 60% of calculate meian value, i.e. ARPT = At risk of poverty threshol = 60% EQ _ INC MEDIAN where j ( EQ _ INC j + EQ _ INC j+ ), if weight' i = i W EQ _ INCMEDIAN = j j+ EQ _ INC j+, if weight' i < W < i weight' i i Institute of Mathematics Polish Acaemy of Sciences, Warszawa, Polan, POB [email protected]

124 88 R.Zieliński: Exact Distribution of the Natural an W = weight' i. Allpersons Then the at-risk-of-poverty-rate is calculate as the percentage of persons (over the total population) with an equivalise isposable income below the atrisk-of-poverty threshol (i.e. the equivalise isposable income of each person is compare with at-risk-of-poverty threshol). The cumulate weights of persons whose equivalise isposable income is below the at-risk-of-poverty threshol, is ivie by the cumulate weights of the total population (i.e. sum of all the personal weights weight' i All persons with EQ _ INC < at risk of poverty treshol ARPR = 00. weight' i All persons A ifficulty which arises is that what we are intereste in is the population ARPT but what we are able to calculate is the sample ARPT, which may be consiere as an estimate ARPT = 60% EQ _ INC SAMPLE MEDIAN of the population ARTP. Then weight' i All EQ _ INC < ARPT ARPR = 00. weight' i () All persons in the sample The problem is to assess how the estimate is relate to the population value of ARPR. The variance of the estimator is one of measures of concern, see for example Berger an Skinner (003), Kovačevic an Biner (997), or Deville (999). In what follows we consier, in a theoretical setup, the exact istribution an the first two moments of (); the exact formula for the variance we obtain as a special case. To get a general insight into the problem we consier the following theoretical moel. Let X be a positive ranom variable with an unknown continuous an strictly increasing istribution function F with F ( 0) = 0 an F ( x) > 0 for x > 0. The appropriate ensity function will be enote by f. Let X, X,..., X n be a ranom sample from the istribution F. For some given q (0,) an α (0,), the problem is to estimate the fraction of the population below the α q th quantile of the istribution F : F ( α F ( q) ) = P { X α F ( q) }. F In the original problem above q = 0. 5 (the meian) an α = 0.6. The natural estimator has the form W = K / n, where

125 STATISTICS IN TRANSITION, March { j X X }, M = [ ] K = max nq : j: n M : n + an X : n, X : n,..., X n: n ( X: n X : n... X n: n) is the orer statistic from the sample X, X,..., X n. Observe that both F( 0. 6 F ( 0. 5)) an max { j : X j : n 0.6 X [ n / ] + : n} / n are in full agreement with ARPR an its estimate (), respectively. Below we construct the exact istribution of K as well as its expecte value an the variance. Numerical examples accompanie by simulation results illustrate the solution for some interesting populations F.. Exact istribution of K from sample of size n. For a given integer M ( M n) following probabilities q an a real ( 0,) { X X }, k,,...,. k P k: n M : n = M α consier the = α () The istribution of K is given by the formula P ith q 0 = an q M =. X, X,..., If the sample { K = k} = q q +, k = 0,,..., M, X k k (3) n comes from a istribution F with the ensity f, then () is given by the formula (see for example Davi an Nagaraja 003) n! qk = ( k )!( M k )!( n M )! n! = ( k )!( M k )!( n M )! 0 k ( F( y) ) f ( y) y F ( x) [ F( y) F( x) ] n M n M Ψ( v) k M k ( v) v u ( v u) u. Here an further on, uner a fixe α, Ψ( x) = F( α F ( x) ). 0 αy M k f ( x) x The probability can be easily calculate by stanar mathematical software. As an example, the following Table presents istributions of K for some istributions F an parameters n = 0, M = 6 (the sample meian), an α = 0.6 : (4)

126 884 R.Zieliński: Exact Distribution of the Natural U( 0,) Γ (,) Distribution F Γ(7,) Par(0.) Par(5.0) Ψ (q) P{K=0} P{K=} P{K=} P{K=3} P{K=4} P{K=5} Here U (0,) is the stanar uniform istribution on the interval 0,, Γ a, b is the Gamma istribution with shape parameter a an scale Par β is the Pareto istribution with istribution function of ( ) ( ) parameter b, an ( ) the form ( ) β x +. Observe that the istribution of K significantly epens on the parent istribution F. 3. Expecte value of K. The value ( q) = F( F ( q) ) ranom variable max{ j : X αf ( q )} {,,..., n} { 0,,..., } Ψ α to be estimate is the expecte value of the j. The support of the ranom variable is 0 while our estimator is base on the ranom variable K with support M. It appears however that the expectation E (K) is almost equal to Ψ (q). By (3) we obtain an by (4) M k 0 E ( K) = Ψ( v) n M qk = ( v) v Now Ψ( v) M 0 k = 0 M k= j( q n! = ( M )!( n M )! k q k+ ) = M k= n! u ( k )!( M k )!( n M )! n! u ( k )!( M k )!( n M )! Ψ( v) M 0 i= 0 n! = Ψ( v) v ( M )!( n M )! q k ( M ) M k i ( v u) k M k i u ( v u) ( v u) u M i M k u u.

127 STATISTICS IN TRANSITION, March an eventually n! E( K) = ( M )!( n M )! 0 Ψ( v) v M ( v) n M The expectation E (K) is ientical with the expectation of the function Ψ (V ) of a ranom variable V with the istribution Beta ( M, n M + ) an moments v. M ( M )( n M + ) E ( V ) =, Var( V ) =. (5) n n ( n + ) Denote Q( x) = F ( x) an note that the first erivative of Q (x) is Q '( x) = / f ( Q( x) ). Then α f ( α Q( x) ) Ψ ' ( x) = f ( Q( x) ) an Ψ '' ( x) = α [ α 3 f Q( x) f ( Q( x) ) f ' ( α Q( x) ) f ' ( Q( x) ) f ( α Q( x) )]. ( ) Consier the Taylor series of the function Ψ (x) aroun the point x = q : Ψ( x ) = Ψ( q) + Ψ ' ( q) ( x q) + Ψ '' ( q) ( x q) +... Then E ( K) = EΨ( V ) = Ψ( q) + Ψ ' ( q) ( V q) + Ψ '' ( q) ( V q) For M = [ nq] + we have [ nq] [ nq] ( n [ nq] ) E ( K) = Ψ( q) + q Ψ ' ( q) + Ψ '' ( q) +... n n ( n + ) If M = nq is an integer then E ( V ) = q, Var ( V ) = q( q) /( n + ), an ( ) E( K) = Ψ( q) + bf ( q) + o n where q( q) b F ( q) = Ψ '' ( q) n + is the main term of the bias of the estimator K, an o ( n ) states for all terms of the orer n or smaller. The values of ( n + ) b ( q) for some istributions F an for q = 0. 5, n even (so that nq is an integer), an for α = 0. 6 are presente in the following Table F +...

128 886 R.Zieliński: Exact Distribution of the Natural F ( n + ) b F (0.5) U (0,) 0 Γ (,) Γ (7,) Par (0.) Par (5.0) One can easily see that even for not very large n the bias is rather small an that asymptotically the estimator is unbiase. 4. The variance of K. For the factorial moment EK ( K ) we have an hence n! = ( n M )!( M )! EK ( K ) = n! EK( K ) = ( n M )! 0 0 ( v) ( v) M k = n M n M k( k )( q v v Ψ( v) M 0 Ψ( v) v 0 k M k = q k + ) = M k = k q k u k!( M k)! M k = 0 M u k k v k + k k ( v u) u v M k M k The sum in the inner integral can be consiere as the mean of the binomial istribution with parameters M an u / v, which equals ( M ) u / v, so that eventually n! M 3 n M EK( K ) = Ψ ( v) v ( v) v. ( M 3)!( n M )! an the exact formula for the variance of K is n! M 3 Var( K) = Ψ ( v) v ( v) ( M 3)!( n M )! 0 0 n M v + u u. E( K) ( E( K) ). 5. Numerical results In tables below we present numerical results for q = 0. 5 (the meian) an α = 0.6 ; these are parameters chosen by Eurostat. Samples of rather small sizes ( n = 0, 50, 00) are treate. In parentheses simulation results are presente; all

129 STATISTICS IN TRANSITION, March simulations results are base on sym = 0 repetitions. The quantities Re sf ( q) = Ψ( q) E( w) bf ( q) illustrate how precise are estimators of W = K / n an how precise is the first-orer formula for the bias b F (q). n = 0, M = 6, q = 0.5, α = 0.6, sym = 0 4 Distribution U(0,) Γ(,) Γ(7,) Par(0.) Par(5.0) Ψ(q) E(W) (sym) ( ) ( ) (0.3900) (0.4560) ( ) Var(W) (sym) (0.089) (0.0354) (0.0004) ( ) (0.0343) b F (q) Re s F ( q) n = 50, M = 6, q = 0.5, α = 0.6, sym = 0 4 Distribution U(0,) Γ(,) Γ(7,) Par(0.) Par(5.0) Ψ(q) E(W) (sym) ( ) (0.345) (0.369) ( ) ( ) Var(W) (sym) ( ) ( ) ( ) ( ) ( ) b F (q) Re s F ( q) n = 00, M = 5, q = 0.5, α = 0.6, sym = 0 4 Distribution U(0,) Γ(,) Γ(7,) Par(0.) Par(5.0) Ψ(q) E(W) (sym) ( ) ( ) (0.505) ( ) ( ) Var(W) (sym) ( ) ( ) (0.0337) (0.034) ( ) b F (q) Re s F ( q) Acknowlegment I am very grateful to Professor Jacek Wesołowski for his valuable comments which significantly improve the presentation.

130 888 R.Zieliński: Exact Distribution of the Natural REFERENCES BERGER, Y.G. AND SKINNER, CH.J. (003), Variance Estimation for a Low- Income Proportions. Social Statistics Research Centre, University of Southampton, Methoology Working Paper M03/03. DAVID, H.A. AND NAGARAJA, H.N. (003), ORDER STATISTICS, Thir Eition, Wiley. DEVILLE, J.C. (999), Variance Estimation for Complex Statistics an Estimators: Linearization an Resiual Techniques. Survey Methoology 5, EUROSTAT DOC. IPSE/65/04/EN. Joint working party with caniate countries. Statistics on income, poverty an social exclusion (IPSE) an EU/SILC (Statistics on income an living conitions ). KOVACEVIC, M.S. AND BINDER, D.A. (997), Variance for Measures of Income Inequality an Polarization The Estimating Equations Approach. Journal of Official Statistics, vol. 3, 4 58.

131 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp MULTIVARIATE SAMPLE ALLOCATION: APPLICATION OF RANDOM SEARCH METHOD Marcin Kozak ABSTRACT In the paper, a sample allocation between strata or omains is iscusse. Sample localization has a big influence on a precision of estimation. In a univariate case, the sample allocation is quite satisfyingly escribe in many textbooks, on the contrary to a multivariate case. A ranom search metho is propose to be applie to allocate the sample. A convergence of the algorithm is presente using simulation stuies for two artificial populations an real agricultural ata. The application of the metho is presente using ata from a survey of micro enterprises (SP-3) conucte by the Central Statistical Office of Polan. Key wors: Domains, Strata, Multivariate sample allocation.. Introuction Stratifie sampling is one of the most often use sampling schemes. Sample allocation between strata has a big influence on a precision of stuie estimators. A stratum can be unerstoo as a group of population elements fulfilling some requirements; quite often strata are forme by the so-calle omains (cf. Särnal et al. 99, Section 0). Then, stratifie sampling is the scheme to be applie in an estimation process. A common problem is a choice of a metho of a sample allocation. The allocation problem in the univariate case is quite satisfyingly escribe (see, e.g., Cochran 977, Särnal et al. 99, Bracha 996). Kozak an Zieliński (005) propose a metho of a sample allocation between omains an strata within the omains, (i.e., when the omains are groupe into the strata,) in three ifferent survey types. In the multivariate case, several analytical an numerical methos were propose (see, e.g., Greń 964, 966, Hartley 965, Kokan an Khan 967, Department of Biometry, Warsaw Agricultural University, Polan [email protected]

132 890 M. Kozak: Multivariate Sample Allocation Wywiał 988, 003, Holmberg 00). However, still none of the methos can be seen as the optimum one in all allocation problems. The aim of the paper is to present the usefulness of a ranom search metho in sample allocation. Various multivariate allocation problems are escribe. A convergence of the algorithm will be presente using simulation stuies for two artificial populations as well as for real agricultural ata. The metho will be applie for the ata regaring the survey of micro enterprises (SP-3), conucte by the Central Statistical Office of Polan.. Sample Allocation Problem Let us consier a population U consisting of N elements; the population U is L subivie into L ( L ) strata or omains U,..., U L, Υ U U. h h = Our aim = is to estimate a population mean of k characters uner stuy X i, i=,...,k. As usually, let us assume that allocation (auxiliary) variables are the same as the survey ones. In the paper, we will consier the mean value as a parameter of interest, unless we will mark the other case. Involving other parameters oes not change the algorithm itself, just the formulas for the estimator an its variance or mean-square error change. We want to fin an L-vector of sample sizes n = { n,..., n h,..., n L }, (where n h is the sample size from the stratum h, h=,..., L,) that minimize some function f n uner constraints ( ) The function ( n) L nh Nh, for h =,..., L ; nh = n ;... () f may get various forms, epening on a survey. In the univariate case it can be: L n ( ) = h S f n W h h, or () h= Nh nh h= n ( ) = max.,,..., h S f n h (3) h h= L X h Nh nh where N h is the hth stratum size, W h =N h /N, S h is the population stanar eviation of the variable uner stuy X restricte to the hth stratum, an X h is the population mean of X in the hth stratum. The minimization of the function () leas to a minimum variance of an estimator of the population mean. Such allocation is calle optimal an can be

133 STATISTICS IN TRANSITION, March obtaine by means of the analytical formula propose by Czuprow (93) an Neyman (934). Such sample sizes are real numbers that nee rouning. However, it can make the allocation not optimal (Bracha 996). Furthermore, in some cases, it can happen that a sample size from a particular stratum (or sample sizes from several strata) is larger than the stratum size (strata sizes); then the constraints () are not fulfille. In such a case, one has to try some other proceure to localize the sample. Hereafter we will propose a ranom search metho to be applie. I propose its application to univariate stratification (Kozak 004), an now a similar algorithm we will apply for the multivariate sample allocation. The function (3) regars a situation in which we aim at precise estimation not only for the whole population, but also for omains. Such problem was consiere for instance by Lenicki an Wesołowski (994) or Kozak an Zieliński (005). Consier a situation in which we are intereste in precise estimation for the whole population, but also in getting a coefficient of variation of the mean estimator in some stratum l not larger than given c l. Then we can wien the constraints () for the function () with n l Sl cl. x l N (4) l nl In such an approach just one stratum, as in (4), or several strata can be involve. For instance, we can esign the survey in which we are intereste in getting the minimum cv of the estimator uner stuy for the whole Polan, but we also require precise estimation for some large cities. Let us consier some examples of objective function that can be use in the multivariate case: L n ( ) = max,,,..., h S ih fm n Wh (5) i i= k h= Nh nh where the hth stratum, an L n ( ) = max,,,..., h S ih fm n Wh (6) i i= k x i h= Nh nh n 3( ) = max., h S f ih m n (7) i h X ih Nh nh i=,...,, k, h=,..., L S ih is the population variance of the variable uner stuy X i restricte to X ih is the mean value of X i in the hth stratum.

134 89 M. Kozak: Multivariate Sample Allocation Vector n that minimizes the function (5) uner constraints () leas to the allocation that minimizes the maximum variance of k estimators. Because we are intereste in the coefficients of variation of the estimators mostly, i.e. ci = xi MSE( xi ), i =,..., k, not in their variances, it is appropriately to use such an approach. We choose minimization of the function (7) when we are intereste in precise estimation not only for the whole population but also for omains, i.e., in sections efine as omains variables. Let us consier a following case. Suppose we are intereste in minimization of the variance of the mean estimator for the variable X, i.e., the f ( n) function, but with aitional constraint that the variance of the mean estimator for the variable Y will not be greater than given value D Y, i.e. L n h S W Yh h DY. (8) h= Nh nh In such a case, the constraints () for the function () has to be wiene with the formula (8). In the constraint (8), we can use cv of the estimator instea of its variance. We have presente several problems of the sample allocation in a case of multi-parameter surveys. This presentation can be wiene with some other examples. In a following section, a ranom search algorithm, as the useful tool in our problem, is propose. 3. The algorithm The algorithm is base on a ranom search algorithm presente by Kozak (004). We will moify some of its aspects an aapt it to the sample allocation problem.. Choose an initial point n. Usually it will be the L-vector consisting of the elements n = n Nh N, h,... L, h = h h i.e., the vector of roune to integers sample sizes from the proportional allocation. Depening on the constraints (), the initial point can be ifferent; for instance, in a case of wiening them with the constraints (8) see section the initial point shoul be the optimal allocation vector for the variable Y. f = f n. Calculate the function value ( ) (9)

135 STATISTICS IN TRANSITION, March For r = 0,,..., R, repeat the following steps: Generate the point n ' by ranomly choosing two strata L an L ; change the allocation in a following way: ' nl = nl + j, n ' h = n h ' nl = n, L j for h =,..., L, h { L, L }, where j is the ranom integer, j 5. (The ranom propriety of j protects us against the algorithm stopping in a local minimum an makes the algorithm faster; the upper boun can be fixe epening on a population an a sample size; the propose value is goo enough even for large samples.) If the constraints () are fulfille an f ( n' ) f ( n), accept n r + = n', else n r + = n. 3. Finish the algorithm if the stopping rule is fulfille, e.g., if r = R, where R is a given number of steps. Take the vector n as the final allocation. 4. Simulation stuies on convergence of algorithm in univariate case With the aim of stuying the convergence of the algorithm in the univariate case, two artificial populations U an U were generate. Their characteristics are given in Tables an. The variances of stuie variables were quite iverse between the strata. Table. Characteristics of artificial population L N h 500,000 00, ,000,000 S h U (0)

136 894 M. Kozak: Multivariate Sample Allocation Table. Characteristics of artificial population U L N h S h L N h S h 5, ,000.0, , , , , , , , Two samples, one of a sample fraction f = 0. an the secon of f = 0.5, were thousan times allocate between the strata of both populations. As a stopping rule, a theoretical test was use (Stachurski an Wierzbicki 00), in which the algorithm finishes its work when the vector n of sample sizes is equal to the vector of the optimum sample sizes, an the function value reaches its minimum, i.e., f n k = f ˆn () k where ( ) ( ) ( ), f n is the value of the function f ( n) in the step k, an ( nˆ ) value of the function f ( n) in the global minimum point nˆ. f is the The number of steps require to reaching the optimum point an number of effective steps were stuie. The effective step is the step in which the vector n changes. The algorithm reache the optimal allocation in all cases. The results are presente in Table 3. Table 3. Results of simulation stuies on the convergence of the ranom search metho algorithm for artificial populations Population, sample fraction Average number of steps Average number of effective steps U, f = U, f = U, f = 0. 3,98,67 U, f = 0.5 5,3,06 A similar stuy was carrie out for ata from the Agricultural Census 00 regaring cereals (variable X) an potatoes area (Y). The frame consiste of farms with the agricultural lan larger than ha. The population characteristics are presente in Table 4. The population (all farms in Polan) was subivie into sixteen omains (voivoships). The samples of size 0,000 an 50,000 were allocate between the omains inepenently for each variable. Results are

137 STATISTICS IN TRANSITION, March presente in Table 5. Again, in all cases the algorithm reache the optimal allocation. Note that in spite of large number of steps (see Table 5), the algorithm i not nee much time to reach the optimal allocation in our stuies, it was not more than few secons. Table 4. Characteristics of farm population from Agricultural Census 00 (X cereals area, Y potatoes area) Voivoship S xh S yh X Y N h , , , , , , , , , , , , ,688 8, , ,93 3, ,837 Table 5. Results of the simulation stuies on the convergence of the ranom search metho algorithm for the farm population (see Tab. 4) Variable, sample size Average number of steps Average number of effective steps X, n=0,000 3,45, X, n=50,000 4,3 5,974 Y, n=0,000 7,584 3,48 Y, n=50,000 3,897 5,768

138 896 M. Kozak: Multivariate Sample Allocation 5. Numerical example survey of micro enterprises (SP-3) In the SP-3 survey, which is the important survey in the Central Statistical Office (CSO) of Polan, many parameters are estimate; especially it is the total turnover of enterprises, for certain omains (so-calle NZ s) efine by the inustry classification (NACE). The precision for those omains shoul be approximately the same. In our example we will allocate the sample of size 00,000 between NZ s; let us assume that the total of two variables is to be estimate, i.e., the number of employees (X ) an enterprise total turnover (X ). Data originate from the 00 CSO survey. Because the most important variable is the enterprise total turnover X, we will allocate the sample in such a way that the objective function will be (4) with aitional constraints that cv of the estimator of the population total for the variable X is not greater than in each omain, i.e.,.96 Sh 0.075, h =,...,77. () x n N h h h The results of such allocation are presente in Table 6. Maximal precision for number of employees, i.e., variable X, was 0.9. The precision for number of employees was not greater than given Table 6. Sample allocation between subpopulations in the survey of micro enterprises (NZ subpopulation, X number of employees, X total turnover, cv coefficient of variation for particular variable, c precision of estimator of total value for particular variable) NZ N NZ n NZ cv X cv X c X 3, , , , ,08, ,039, ,57, , , ,009, , ,695, , c X

139 STATISTICS IN TRANSITION, March NZ N NZ n NZ cv X cv X c X 4 7, ,50, , ,878, ,85, , ,99, ,75, , ,70, , , , ,8, , , , ,603, , ,80, , , ,974, , , , , , , , , c X

140 898 M. Kozak: Multivariate Sample Allocation NZ N NZ n NZ cv X cv X c X 49 9,979, ,4, , ,933, , , , , , , ,497, ,640 5, ,6, ,658, ,8, ,78, , , ,39, , , , ,344 5, , c X 6. Conclusion The ranom search metho has been applie in many areas, in which numerical optimization has to be involve. In survey sampling, it has been use in univariate stratification (Niemiro 999, Kozak 004). In this paper, the ranom

141 STATISTICS IN TRANSITION, March search algorithm is applie to sample allocation between strata or omains. A convergence of the algorithm in the univariate case is proofe by means of the simulation stuies. The example in the two-variate survey base on real ata is presente. The algorithm is quite easy in implementation I have use the R system (R Development Core Team 004); however, other programming languages can be use either. The algorithm proofe to be fast an effective. It makes me to recommen it in practical surveys in which a sample allocation between strata or omains is involve. REFERENCES BRACHA, Cz. (996), Theoretical Basis of Survey Sampling. PWN, Warsaw, Polan (in Polish). COCHRAN, W. G. (977). Sampling Techniques. John Wiley & Sons, New York. CZUPROW, A. A. (93), On the Mathematical Expectation of the Moments of Frequency Distributions in the Case of Correlate Observations, Metron, , GREŃ, J. (964), Some Methos of Sample Allocation in Multi-Parameter Stratifie Sampling. Przeglą Statystyczny,, (in Polish). GREŃ, J. (966), Some Application of Non-Linear Programming in Survey Sampling Metho. Przeglą Statystyczny, 3, 03 7 (in Polish). HARTLEY, H. O. (965), Multiple Purpose Optimum Allocation in Stratifie Sampling. Proceeings of the American Statistical Association, Social Statistics Section, HOLMBERG, A. (00), A Multiparameter Perspective on the Choice of Sampling Design in Survey. Statistics in Transition 5, KOKAN, A. R. an KHAN, S. (967), Optimum Allocation in Multivariate Surveys: an Analytical Solution. Journal of the Royal Statistical Society, B 9, 5 5. KOZAK, M Optimal Stratification Using Ranom Search Metho in Agricultural Surveys. Statistics in Transition, 6 (8), KOZAK M., ZIELIŃSKI A Sample Allocation between Domains an Strata. Int. J. Applie Math. Stat. 3, LEDNICKI, B. an WESOŁOWSKI, J. (994), Sample Allocation between Subpopulations. Wiaomości Statystyczne 9, 4 (in Polish).

142 900 M. Kozak: Multivariate Sample Allocation NEYMAN, J. (934), On the Two Different Aspects of the Representative Metho: the Metho of Stratifie Sampling an Metho of Purposive Selection. Journal of Royal Statistic Society 97, NIEMIRO W. (999), Optimal Stratification using Ranom Search Metho. Wiaomości Statystyczne, 0, 9 (in Polish). R DEVELOPMENT CORE TEAM (004), R: A language an environment for statistical computing. R Founation for Statistical Computing, Vienna, Austria; URL SÄRNDAL, C. E., SWENSSON, B. an WRETMAN, J. (99), Moel Assiste Survey Sampling. Springer-Verlag. STACHURSKI, A. an WIERZBICKI, A.P. (00), Basis of Optimization. OFICYNA WYDAWNICZA POLITECHNIKI WARSZAWSKIEJ, WARSAW, POLAND (IN POLISH). WYWIAŁ, J. (988), Location of Sample in Strata Minimizing Spectral Raius of Variance-Covariance Matrix. Prace Naukowe Akaemii Ekonomicznej we Wrocławiu, 404, 3 35, (in Polish). WYWIAŁ, J. (003), Some Contributions to Multivariate Methos in Survey Sampling. Prace Naukowe Akaemii Ekonomicznej w Katowicach, Polan.

143 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp REMARKS ON USING THE POLISH LFS DATA AND SAE METHODS FOR UNEMPLOYMENT ESTIMATION BY COUNTY Jan Kubacki ABSTRACT The author presents a synthetic overview of recent efforts relate to the small area estimation methos applie to the Polish Labour Force Survey (PLFS). The review concerns methoology an results obtaine by Central Statistical Office connecte with PLFS an National Census an some results obtaine by the author of this paper. In the paper author iscusses various methos of estimation together with evaluation of quality of such estimation relate in particular with type of auxiliary ata use for borrowing strength an efficiency of initial estimates use in moels. Key wors: Small area estimation, labour force survey, moel approach, Bayes estimation, quality of statistical ata.. Introuction The surveys, especially social surveys that are prepare by Polish Central Statistical Office are esigne in such a manner that allows estimating of most parameters with accepte precision only at the national an (partially) regional level. However, mainly ue to increasing eman of reliable ata for small areas an also because of European Regulation No 577/98 (998) on the organisation of a labour force sample survey, there is necessity to prepare the techniques of estimation that will be suitable to satisfy such nees. Recently in Polish official statistics some important improvements to the small area estimation methoology (in particular those concerne with Polish Labor Force Survey PLFS) was mae. This was connecte with publishing the results of PLFS for areas smaller than regions (e.g. counties poviats) together with publishing the results from the 00 National Population Census (Bracha et Central Statistical Computing Centre, Al. Niepoleglosci 08, Warsaw, Polan. [email protected]

144 90 J.Kubacki: Remarks on Using the Polish LFS al. 003) an the efforts connecte with using the complex estimation methos (especially empirical an hierarchical Bayes estimation) which have to improve the quality of such estimation (Bracha et al, 004). One shoul mention here, that apart from the attempts mae by official statisticians, many results were obtaine by Polish acaemic researchers, especially by the group from Poznan University of Economics, that participate in EURAREA project (Dehnel, Gołata, Klimanek 004, an Gołata 004) an also by University of Loz (Pekasiewicz, Pruska 00). Some results concerning the small area estimation methos were obtaine by author of this work, which were presente at Conference on Small Area Estimation in Riga (Kubacki 999), an also publishe later in extene version (Kubacki 000). Also some results of application of the hierarchical Bayes estimation were publishe recently (Kubacki 004). In this paper, the author presents the improvements of the results obtaine by Bracha et al. (003) using hierarchical Bayes estimation base on the moel propose earlier (Kubacki 004) an Polish Census ata. This paper presents moel for unemployment, which inclues ata from Polish 00 Census, that make possible to obtain precise estimates of unemployment size for regions. In this paper also assessment of estimation quality base on ifferent a priori estimates was presente.. Brief escription of PLFS esign an estimation proceures The Polish Labour Force Survey was originally esigne to meet the requirements of the ILO recommenations concerning the labour force surveys. It was conucte first time in May 99 an until 3 r quarter 999 its esign remains unchange. Both before an after 999 the survey inclue all persons in the househol age 5 an over. The sample is constructe using two stage sampling scheme. At first stage the primary sampling units (PSU) were selecte using Hartley-Rao with selection probability proportional to the number of the occupie flats in PSU. The seconary sampling units were selecte using the simple ranom sampling scheme. Until changes in 999 the PLFS was a quarterly survey. The sample was obtaine using the rotation scheme i.e. in each quarter is a set of four, selecte inepenently elementary samples, an in every quarter a partial exchange of elementary samples was performe. Here the following rule was use: at each quarter four elementary samples were use two elementary samples, that were employe uring last quarter, one new introuce elementary sample an one sample introuce the year before. More etails can be foun at Szarkowski an Witkowski paper (994) an also in article by Koros, Lenicki an Żyra (00) Since the 4 th quarter of 999 the PLFS was re-esigne. This was partially ue to the changes in aministrative ivision of Polan. The sampling scheme (selection of PSUs an SSU s) was analogous as earlier, but sample allocation by

145 STATISTICS IN TRANSITION, March voivoships (regions) was change. Such stratification was esigne to get better precision of estimates by voivoship. The size of sample that represents each voivohip was nearly proportional to the square root of number of wellings in every voivoship. The rural-urban ivision within voivoship was also use here, mainly ue to increase the precision for rural areas. Also, because of organizational purposes, in rural areas an small cities 8 welling, for cities 6-7 welling an 5 welling for large cities in each PSU were selecte. The estimation proceure use in PLFS is base on set of weights, that incorporate the universal weight F (the esign weight), that is obtaine from reciprocal of the selection probability, the seconary weight that regars the non interview factors, an the final weights, that uses the seconary weight an current population estimates. The esign weight is obtaine from the selection probability as follows: F = (.) π k where π k is the selection probability for k-th class of locality. Next the realisation coefficients R k are obtaine using the following formula Nˆ k Rk = (.) N ˆ k + Bk where Nˆ k = (.3) π j s k j an π j is the selection probability of the j-th welling, that belongs to k-th class of locality. B k is the number of non-interviewe wellings. So the seconary weight is F/R k an final weight for person in the j-th welling, that belongs to l-th age-sex group an k-th class of locality is as follows where G G R l l k M kl = = = gˆ kl gˆ l W M l jkl R = M kl (.4) π R k j k an here G l is the size of population base on the current emographic estimates an population estimates from PLFS using the following formula x jl gl gˆ kl = x = = (.5) π R jl j s k π k Rk Rk j s k j k

146 904 J.Kubacki: Remarks on Using the Polish LFS 3. Techniques of small area estimation applie to PLFS estimates To improve the quality of estimates for Polish LFS the Central Statistical Office unertook the research, which main goal was the selection the best estimation technique (also using moel approach) of various parameters at the local scale. First attempts of such research were presente at International Scientific Conference Small Area Statistics an Survey Designs, organise in Warsaw in 99. (see Kalton, Koros,. an Platek, (993)). The paper by Bracha (994) was one of the first methoological stuies of the small area estimation technique in Polan. Recently another two papers by Bracha, Lenicki an Wieczorkowski were publishe (003,004) an present the possibility of obtaining the estimates for smaller area than regions. In first paper, publishe in 003, three types of estimators were use. First was an estimator that has similar form, than escribe above, secon was the synthetic estimator, which has the following form: for regions (voivoships) x w = tf w (3.) where f w is the contribution of particular variable for voivoshp w in the whole country, an t is estimator of that variable for the whole country for counties (poviats) x wp = t w f wp (3.) where f wp is the contribution (using Census 00 ata) of particular variable for poviat p in the voivoship w, an t w is estimator of that variable for the vovioship w Thir use form was the composite estimator propose by Griffith s (996) y wp wp wp ( v wp ) x wp = v t + (3.3) where v wp is weight for irect estimator for county p (in paper by Bracha et al. (003) is equal 0.5) an x wp is the synthetic estimator for county p in region v. Such methos of estimation were applie with application of Census 00 ata, as a auxiliary variable. In secon paper, publishe in 004, apart from this three estimators presente above, the Bayesian approach was use. Here the empirical Bayes (EB) estimation an hierarchical Bayes (HB) estimation were applie to the estimates, that uses the irect estimator (similar to estimator use for the whole country). However, here mainly because of precision of estimates the estimates were prepare for the whole year, not for the quarter. Also, the results of estimates, that use the estimators having the form (..3) were presente.

147 STATISTICS IN TRANSITION, March The basis for empirical Bayes estimates was regression moel that uses ata from unemployment registration an emographic estimates. Three epenent variables were estimate number of employe persons number of unemploye persons number of non-active persons. In moels the following exploratory variables were use total size of registere unemployment (for particular level of aggregation) current population estimates (for particular level of aggregation) ata about unemployment at the county (poviat) level qualitative variable responsible for urban-rural factor The last two exploratory variables ware not present in moels, where urbanrural factor was consiere. Such moels, were prepare for poviats, that have more than 0 PSU were rawn in 003 year. The moel has the following form: θˆ = x b + u (3.4) p T p where b is the unknown vector of regression coefficients, x represents the exploratory variables an u p is ranom inepenent variable with istribution u p ~N(0,σ u ) The moel (.4) can be rewritten in matrix form as follows: Θ = Xb + u (3.5) The b vector can be obtaine from classic least-squares estimator, an has the form: b ˆ T = ( X X) X T Θˆ (3.6) Using such estimates, an Bayesian inference, the empirical Bayes estimator has the following form: ~ = α ˆ θ + ( α ) θ (3.7) where EB y p α p is constant chosen to minimize the MSE of estimator (3.7), θˆ p is estimator of parameter θ p from the survey sample ~ θ = x T p b ˆ is the preictor of that parameter for the poviat p p For empirical Bayes estimation the α p has the form p p p p p

148 906 J.Kubacki: Remarks on Using the Polish LFS α p0 ~ D ( θ p ) = ˆ ~ (3.8) MSE( θ ) + D ( θ ) p p where ~ D ( = ( ˆ) x (X X) T T θ p ) S u p p (3.9) an MSE ˆ θ ) is estimate mean square error obtaine from sample for ( p parameter θ p. The value of S ( uˆ ) can be obtaine from x an uˆ p = ˆ θ p ~ θ p S ( uˆ) = (3.0) P q P uˆ p p= Metho that uses hierarchical Bayes (HB) approach is base on assumption, that the prior istribution f(λ) of moel parameters λ is known an the posterior istribution f(μ λ) of small area parameters μ (which are the target of such inference) given the ata y is obtaine. The Bayes theorem use here is base on the following reasoning: Let us suppose, that the we must obtain the esire posterior ensity: f ( μ y ) = f ( μ, λ y) λ (3.) Using Bayes inference we have: f ( μ, λ y) = f ( y, μ λ) f ( λ) f ( ) (3.) y where f (y) is the marginal ensity of y an has the form: f ( y ) = f ( y, μ λ) f ( λ) μλ (3.3) In simple case, the posterior ensity can be obtaine analytically, what is involve with numerical integration of the marginal ensity (.3). However, in composite situation, such integration becomes intractable. Markov Chain Monte Carlo (MCMC) methos are often use in evaluating the target posterior ensity. Below are the assumptions of hierarchical moel use by Bracha, Lenicki, Wieczorkowski (004). ˆ θ θ, b, σ ~ N( θ, Dˆ ( ˆ θ )) (3.4) p p u p p T θ b, σ ~ N( x b, σ ) (3.5) p u p u

149 STATISTICS IN TRANSITION, March T b ~ N( bˆ, σ ( X X ) ) (3.6) u σ ~ G( a, b) (3.7) u where G enotes the Gamma istribution with shape parameter a an scale parameter b. This parameters are obviously unknown, an is assume to be equal to a=b=0.00. Such assumption is mae internally in WinBUGS software that was use to obtain the estimates using hierarchical Bayes metho. 4. Summary iscussion of small area estimates obtaine for PLFS For ata from first paper (003) the authors iscusse the values for coefficient of variation (CV) an coefficient of consistency for small area estimators. The coefficient of variation was obtaine using the bootstrap technique analogous to the metho propose by P.J.McCarthy an C.B.Snowen (985). The coefficient of consistency for synthetic versus irect estimator was obtaine using the following formula: x ws tws zws = (4.) t where x ws is obtaine accoring to (3.) an t ws is the irect estimator. Similarly for composite estimator ws y t = z (4.) ws ws uws = 0. 5 tws where y ws is obtaine analogously to (3.3) an t ws is the irect estimator. Because of the epenence u ws =0.5 z ws only the coefficient of consistency between irect an synthetic, an between composite an synthetic was investigate. In earlier the comparison of performance of ifferent small area estimators was mae. The istribution of CV s (presente both in graphs an in eciles tables) show, that the synthetic estimator has the best precision, the composite estimator has the intermeiate precision. The irect estimator, as it was expecte, has the worst performance. Moreover, the efficiency of such estimates is better, when the consiere small area was larger (for regions), what can be easily explaine, since the sample size for regions is much larger than for counties. However, since of the bias of synthetic estimates, it is probably vali, that accuracy of composite estimator may be better, than for synthetic estimator. The istribution of CV s for regions an subregions shows istinctively the right asymmetry, practically in every consiere situation. The analysis of coefficient of consistency shows, that consistency of estimates is poor for unemployment estimates. It is mainly because the size of ws

150 908 J.Kubacki: Remarks on Using the Polish LFS such population is relatively small with comparison to other groups (working, an non-active persons). The istribution of such coefficient in all cases is almost symmetric (selom shows right asymmetry). The concentration of consistency coefficients for synthetic versus composite istribution is larger than for synthetic versus irect, what means, that particular eciles is larger (mostly two times) for synthetic vs. composite than for synthetic versus irect. In both iscusse papers (Bracha et al. (003, 004)) the results are presente for regions (voivoship), subregions an counties (poviats). However, the accuracy of results obtaine using irect, synthetic an composite estimators is limite particularly because of not acceptable precision (like in case of irect estimator) or significant bias (in a case of synthetic estimator). Also, for some counties (poviats), there are no observe ata, or (mostly for poviats, that has less than 0 PSU selecte) there is too few ata to make creible estimates of most parameters. Here the moel approach can be applie, for example using empirical an hierarchical Bayes metho. The quality of such estimates is connecte with the size of particular unit (i.e. county) an also quality of use moel. The results presente in secon paper (publishe in 004) reveals, that espite relatively better precision in most cases for EB estimates, than for irect estimates, the CV characteristics (most CV obtaine for synthetic estimates is smaller than for EB estimator) are better for synthetic estimates. The istribution of CV shows strong right asymmetry, an almost 75% of values belong to the first two class intervals. The results of HB estimation shows, that the precision for such estimates has slightly less efficiency, than for EB estimators. Similarly the istribution of estimates is highly skewe, with strong right asymmetry. However, as Bracha et al. (004) pointe out, the characteristics of such estimates may epen on assumption of istribution type (an particularly the parameters of such istribution), an also implementation of MCMC proceure use by software, that make the estimates. Author of this paper also confirme such behavior. Using ifferent initial parameters for moel an a-priori estimates, that has ifferent quality (for example obtaine using irect an synthetic estimation), it can be showe experimentally that such selection has significant impact on quality of estimates. Such estimates were one for counties in łózkie region. The moel uses PLFS 003 estimates from the Bracha, et al. (004) paper together with ata from aministrative sources, that is available at Polish Public Statistics web pages. It will be interesting to compare similar estimates that are base on Census 00 ata, what may reveal usefulness census vs. aministrative ata. Such comparison may also reveal the accuracy of moel, that uses Census explanatory variables an current ata from aministrative sources or statistical reports Below is presente the istribution of empirical an hierarchical Bayes estimates mae by Bracha et.al (004)

151 STATISTICS IN TRANSITION, March Figure. Distribution of coefficient of variation for PLFS estimates of number of unemploye using ata from 003 year estimate by empirical Bayes proceure

152 90 J.Kubacki: Remarks on Using the Polish LFS Figure. Distribution of coefficient of variation istribution for PLFS estimates of number of unemploye using ata from 003 year estimate by hierarchical Bayes proceure The results presente in Tables an concern the estimates of unemployment obtaine using three ifferent estimation methos. These results show, that practically in every case, that the moel approach effects better precision. However the comparison of empirical an hierarchical Bayes estimators is not straightforwar. For most units, the HB approach is better, than EB approach, with exception for city of Łóź, where these relationships are ifferent. Similar finings were presente in earlier work of author (Kubacki 004), however that results was base on ifferent metho of variance estimation, that uses ranom group technique, what may cause less stable variance estimate. The precision of estimates presente in Table reveals relatively small variance values for moel base estimates. However, still the question is vali, whether such approach reveals true nature of investigate population variability. These estimates may be helpful in situation, where size of the sample is relatively small, but if they will be use for escribing the precision of survey, such assessment may be misleaing.

153 STATISTICS IN TRANSITION, March Table. Comparison of unemployment estimates an theirs precision from PLFS using irect, empirical an hierarchical Bayes estimation for unemployment moel using ata from the 003 PLFS for lozkie voivoship (irect a priori estimates) Unemployment estimates from Coefficient of variation County Direct est. EB est. HB est. Direct est EB est. HB est. 000 %% Bełchatów Brzeziny Kutno Łask Łęczyca Łowicz Łóź-wschó Opoczno Pabianice Pajęczno Piotrków Poębice Raomsko Rawa Maz Sieraz Skierniewice Tomaszów M Wieluń Wieruszów Zuńska Wola Zgierz City of Łóź City of Piotrków Trybunalski City of Skierniewice Source: own calculations. The more etail analysis of results presente above, shows also, that there is a epenency between the size of region (measure by size of the working population) an value of variance reuction. Such epenency was also foun for ata from Table, where positive reuction of variance for EB metho relative to HB metho was observe for City of Łóź. Such result, however, may not be the rule. It can be treate as a research proposal an analyzing that epenence may reveal the nature of both empirical an hierarchical estimators.

154 9 J.Kubacki: Remarks on Using the Polish LFS Figure 3. Coefficient of variation for PLFS estimates of number of unemploye using ata from 003 year for counties in łózkie region obtaine using irect, empirical Bayes (EB) an hierarchical Bayes (HB) estimator Table. Coefficient of variation reuction ( CVHB CVEB ) / CV EB for estimates using empirical (EB) an hierarchical (HB) Bayes estimation Coefficient of variation Coefficient of variation reuction Region (voivoship) irect EB HB HB EB estimator estimator estimator % % Dolnośląskie 6.0,7,6-3,8 Kujawsko-pomorskie 6.9,,0-9, Lubelskie 7.4 3,9 3,0-3, Lubuskie 7. 4, 3,4-7, Łózkie 5.7,9,8-3,5 Małopolskie 7.0 3,3 3,5 6, Mazowieckie , 40,0 Opolskie 9.6 8, 7,0-4,7 Pokarpackie 6.6 3, 3,0-3,3 Polaskie 0.9 6,7 4,5-3,9 Pomorskie 7.3,8,3-7,9 ( CV. CV ) / CV EB

155 STATISTICS IN TRANSITION, March Coefficient of variation Coefficient of variation reuction Region (voivoship) irect EB HB HB EB estimator estimator estimator % % Śląskie ,8 6,7 Świętokrzyskie 8. 3,8,8-6,4 Warmińsko-mazursk ,,9-9,4 Wielkopolskie 6.8 3, 3,5 9,4 Zachonio-pomor ,7-0 ( CV. CV ) / CV Figure 4. Depenency of coefficient of variation on size of region population for LFS estimates of number of unemploye using ata from 4 th quarter 00 year for polish regions obtaine using irect, empirical Bayes (EB) an hierarchical Bayes (HB) estimator EB It can be mentione also here, that the comparison of that two methos is not obvious. Such conclusion can be foun for example in recent Sinha an Ghosh (004) paper, that was presente at Ims/Asa s Srms Joint Mini Meeting on Current Trens in Survey Sampling an Official Statistics organize between an 3 January, 004 in Calcutta, Inia. Similar comparison also can be foun in Ghosh an Rao (994) paper.

156 94 J.Kubacki: Remarks on Using the Polish LFS The comparison of moel for regions, that uses Census 00 results, shows that, in the situation where precision, for the whole moel is better, the EB estimates is slightly more precise, especially for larger regions. This is presente in table below 5.Conclusions The evaluation of ifferent methos of small area estimation reveals, that moel approach applie to irect estimates may allow obtain more precise estimates. Even, when initial characteristics of estimates isn t satisfie, using properly constructe moel an reliable auxiliary ata may make possible to obtain precise an conform estimates. The authors of results iscusse here (Bracha et al. 004) suggest that two types of estimators for small area may be accepte. First, it can be composite estimator that has form propose by Griffiths. Secon, it can be empirical Bayes estimator that (as it was shown experimentally) has better statistical performance, than other moel base methos. As it was pointe out by Bracha et al. the metho of estimation use actually in PLFS is useful for parameters relate to the whole country but it is not aequate for estimation of parameters for lower aggregate level (especially for counties). Accoring to this the authors suggests the following solutions moification of proceure of obtaining the response rate, which is use in constructing of weights using emographic ata relate to sex an age group for region estimates (accoring to the Eurostat recommenation) application of synthetic estimates to isaggregate the estimates at the region an county level application of Bayesian methos for counties However when initial moel has goo statistical characteristics, the quality of estimates using both empirical an hierarchical gives relatively similar precision an accuracy results. The quality of such estimation also epens on selection of the a priori estimates, what is consistent with results obtaine for PLFS ata from 003 year using ifferent methos of initial estimates. Further examination of EB an HB moels (for example for counties for the whole country) may explain further statistical properties of such approach. Also, the estimation of other parameters obtaine for PLFS an construction of moel of such parameters may be interesting, partially because of the observe epenency on quality of the a priori estimates.

157 STATISTICS IN TRANSITION, March REFERENCES BRACHA C. (994), Methoological Aspects of Small Area Research (in Polish). series "Z Prac Zakłau Baań Statystyczno-Ekonomicznych". GUS, Warsaw. BRACHA, Cz., LEDNICKI, B., WIECZORKOWSKI R. (003). Data Estimation from Polish Labour Force Survey for counties in (in Polish) GUS, Warszawa BRACHA, Cz., LEDNICKI, B., Wieczorkowski, R. (004), Application of complex estimation methos to the isagregation of ata from Polish Labour Force Survey in 003. GUS, Warszawa, Z Prac Zakłau Baań Statystyczno-Ekonomicznych, Zeszyt 99 COUNCIL REGULATION (EC) No 577/98 (998) on the organisation of a labour force sample survey in the Community DEHNEL, G., GOŁATA, E., KLIMANEK, T., (004) Consieration of Optimal Sample Design for Small Area Estimation, Statistics in Transition, Vol. 6, No. 5, pp GHOSH, M., RAO, J.N.K. (994) Small Area Estimation: An Apprisal, Statistical Science, 9, no, pp GOŁATA, E., (004) Problems of Estimating Unemployment for Small Domains in Polan, Statistics in Transition, Vol. 6, No. 5, pp GRIFFITHS, R. (996): Current population survey small area estimation for congressional istricts. Proceeing of the Section On Survey Research Metho. American Statistical Association, STR KALTON, G., KORDOS, J. AND PLATEK, R. (993). Small Area Statistics an Survey Designs, Vol. I: Invite Papers; Vol. II: Contribute Papers an Panel Discussion. Central Statistical Office, Warsaw. KORDOS J., LEDNICKI B., ŹYRA M. (00) The Househol Sample Surveys in Polan, Statistics in Transition, 5, 4, KUBACKI, J. (999) Evaluation of Some Small Area Methos for Polish Labour Force Survey in One Region of Polan, Proceeings of the IASS Satelite Conference on Small Area Estimation, Riga, Latvia, KUBACKI, J. (000) Some Small Area Estimation Methos for Polish Labour Force Survey in One Region of Polan, Statistics in Transition, 4, 5,

158 96 J.Kubacki: Remarks on Using the Polish LFS KUBACKI, J. (004) Application Of The Hierarchical Bayes Estimation To The Polish Labour Force Survey, Statistics in Transition, Vol. 6, No. 5, pp MCCARTHY, P.J. AND SNOWDEN, C.B. (985) The bootstrap An Finite Population Sampling. Vital an Health Statistics, pp. -95, Public Health Service Publication , U.S. Government Printing Office, Washington DC PEKASIEWICZ., D., PRUSKA K., (00) Analysis of Distribution of Some Estimators in Small Area Statistics, Folia Oeconomica, 56, pp. 9 SINHA, K., GHOSH, M., (004) Empirical an Hierarchical Bayes Estimation in Finite Sampling Uner Measurement Error Moels, Ims/Asa s Srms Joint Mini Meeting on Current Trens in Survey Sampling an Official Statistics, an 3 January, 004 at Calcutta, Inia ( ). SZARKOWSKI A., WITKOWSKI J. (994), The Polish Labour Force Survey, Statistics in Transition,, 4, pp

159 STATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp COMPARISONS OF THREE PRODUCT-TYPE OF ESTIMATORS IN SMALL SAMPLE Arun K. Singh, Lakshmi N. Upahyaya 3 an Housila P. Singh 4 ABSTRACT In this paper we have propose a class of unbiase prouct-type estimators for estimating the population mean Y of the stuy variate y using auxiliary variate x in single phase sampling. Expressions for the bias an mean square error of the propose class of estimators have been erive in small sample assuming a linear moel, an its exact efficiency compare with the usual unbiase estimator y, conventional prouct estimator y p an unbiase jack-knife prouct estimator y pj. Minimum mean square error of the propose class of estimators has been erive, Here, to simplify the iscussion, we have confine ourselves to simple ranom sampling an assume the population size to be infinite. Key wors: Auxiliary variable, unbiase, mean-square error, proucttype estimator, relative efficiency.. Introuction It is well known that the prouct metho of estimation is quite effective when the correlation between stuy varite y an auxiliary variate x is negative (high). Let y, x ) ( i =,,..., n) enote observations on the variate ( y, x) from ( i i a simple ranom sample of size n an let y an x enote the sample means. This paper was presente in 57 th Annual Conference of Inian Society of Agricultural Statistic at G.B. Pant University of Agriculture an Technology, Pant Nagar uring February 9, 004. SASRD, Nagalan University, Meziphema Nagalan. Inia. 3 Department of Applie Mathematics, Inian School of Mines, Dhanba Jharkhan. Inia. 4 School of Stuies in Statistics, Vikram University, Ujjain Mahya Praesh. Inia.

160 98 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three Assuming the population mean X of the auxiliary varite x to be known, we efine the well known prouct estimator for Y as y p = y ( x / X ) (.) In large sample, to the first egree of approximation, it is known that y p is better than the sample mean y when ρ ( C y / C x ) < (/), where ρ is the correlation coefficient between y an x, C x an C y are the coefficients of variation of x an y respectively. A metho of reucing the bias in an estimator was first introuce by Quenouille (956), which was name as the Jack-Knife metho by Tuckey (958). Let the sample of size n be split ranomly into two sub-groups each of size n /. Then we efine the unbiase prouct-type estimator for Y as y = ( p * / X ) (.) pj * where = [p - (/) (p + p )], p = y x, p p = y x i i i ( =, ( y, xi, i =, ) are the i-th sub-sample means of y an x respectively. i i ) an. The suggeste class of unbiase estimators In this paper, we have consiere a linear combination of unbiase estimator y an jack-knife estimator y pj an propose a class of estimators for the population mean as y a = (-w) y + w y pj ; 0 < w < (.) where w is a constant weight to be etermine suitably. To erive the bias an mean square error of the propose class of estimators, we consier Durbin s linear moel, which is of the form y i = α + β x i + u i ; β < 0 E ( u i x i ) = 0 E ( u iu j x i x j ) = 0 for i j E ( u i x i ) = n γ where γ is a non ranom quantity of orer istribution with the parameter h. n an x i / n has a Gamma Recently, consierable attention has been given for the construction of ratiotype estimators in small sample survey. However, not much is known about the

161 STATISTICS IN TRANSITION, March exact efficiency of prouct estimators in small samples. Therefore, we have consiere the exact efficiency of prouct estimators assuming a linear moel. It is to be mentione that the moel (.) has been earlier use by Durbin (959), Rao an Webster (966), Rao (98), Chakraborty (973, 979), Singh an Singh (999) an others. Uner the moel the (.) the bias an mean-square error of the propose class of estimators y a can be expresse as B( y a ) = E( y a - Y ) = 0 (.3) an MSE( y a ) = [ m ] - w α +[ m(- w) + 4(m + )w + 4mw(- w) ] β + [ w + w(- w) ]αβ - + [( w) + m (m + ) w + w(- w) ]γ (.4) Putting w = 0 an w = in (.4) the mean-square error of the estimators y an y pj can be obtaine as an MSE( pj MSE(y ) = [ β + γ ] - - y ) = [ m α 4(m + ) β + 4α β + m (m + ) γ ] We can write mean-square error of the estimator m (.5) + (.6) y p, uner the moel (.), as MSE( p + (.7) The MSE of the estimator y a is minimum for w = w opt = - [ β ( α + m β )][ ] - m - m α + (m + 4) β + α β + γ (.8) with minimum MSE as min. MSE ( δ + mβ - - y ) = m [ α (4m + m + 6) β + 4(m + ) α β + (m + ) γ ] y ) = [ ] a [ ( α mβ )] β + [ ] - m - m + (m + 4) β + α β + γ α (.9) In terms of the moel (.) the values of α, β an γ can be expresse as function of Y, m, K an ρ as α = Y (K- ρ) - K - β = Y ρ ( m K) an γ = Y (- ρ ) ( m K ) - where K = C x / C y.

162 90 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three The MSE s of the estimators consiere here are quite intricate. In orer to have tangible ieas about the behaviour of the suggeste estimators we have compute the relative efficiencies, in percent, of the estimators y p, y pj an y a with respect to sample mean y from the following: REF( y p, y ) = e p = [ MSE( y ) / MSE( y p )] 00, REF( y pj, y ) = e pj = [ MSE( y ) / MSE( y pj )] 00, an REF( y a, y ) = e a = [ MSE( y ) / MSE( y a )] 00 for selecte values of ρ, m, K an w an compile in Tables to 6. However it may be note that the optimum values of w have been inicate in the brackets in Table Recommenations The results of the Tables to 6 may be summarize as follows: (i) For w = 0.5, 0.50 an 0.75 an fixe K, the efficiency of the propose class of estimators y a monotonically increases as m an ρ increase. For small values of m, the efficiency of y a increases rapily but when m becomes large (m 3) the increase in the efficiency is slow. (ii) The efficiency attains stability for fixe values of K, large values of m( 3) an for high correlation. (iii) The propose class of estimators y a is foun to be more efficient than the simple mean estimator y except only when for w = 0.5 (a).5 K.0, ρ - 0. an m 8 (b) K =, ρ = - 0. an m 6, for w = 0.5 (a) 0.50 K.0, ρ - 0. an 8 m 3 (b) K, ρ an 8 m 3 an for w = 0.75 (a) 0.50 K.0, ρ an 8 m 3 (b) K.5, ρ = an 8 m 3 (c) K, ρ an 8 m 3.

163 STATISTICS IN TRANSITION, March (iv) From Tables an 4, we observe that for w = 0.5, the propose class of estimators y a beats our usual prouct estimator y p except only when (a) K =, ρ = (b) K, ρ an 6 m 3 (v) From Tables an 4, we conclue that for w = 0.50, the propose class of estimators y a is foun to be more efficient than the usual prouct estimator y p except only when (a) K 0.50, ρ an 6 m 3 (b) K =, ρ an m 3. (vi) From Tables 3 an 4, it can be seen that for fixe value of w = 0.75, the propose class of estimators y a is superior than the usual prouct estimator y p except only when K 0.50, ρ an 6 m 3. (vii) From Tables an 5, we observe that for w = 0.5, y a is superior than the estimator y pj except only when (a) K 0.50, ρ an 6 m 0 (b) K =.0, ρ = an m 6 (c) K 0.50, ρ an m 3 (viii) From Tables, 3 an 5, it can be observe that the propose class of estimators y a is more efficient than the estimator y pj for w = 0.50 an K 0.50, ρ an m 6 for w = 0.75 an K 0.50, ρ an m 0. (ix) From Tables 4, 5 an 6 it can be seen that the propose class of estimators y a, uner the optimum conition, is more efficient than the usual estimator y, prouct estimator y p an the unbiase jackknife prouct estimator y pj.

164 9 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three Table. The exact relative efficiency (in percent) of the estimator y a when w = 0.5 for selecte values of m, K an ρ K ρ m = m =6 m =0 m =3 K ρ K ρ K ρ

165 STATISTICS IN TRANSITION, March Table. The exact relative efficiency (in percent) of the estimator 0.50 for selecte values of m, K an ρ y a when w = m =8 m =6 m =0 m =3 K ρ K ρ K ρ K ρ

166 94 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three Table 3. The exact relative efficiency (in percent) of the estimator w = 0.75 for selecte values of m, K an ρ K ρ m = y a when m =6 m =0 m =3 K ρ K ρ K ρ

167 STATISTICS IN TRANSITION, March Table 4. The exact relative efficiency (in percent) of the estimator selecte values of m, K an ρ y p for m =8 m =6 m =0 m =3 ρ K K ρ K ρ K ρ

168 96 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three Table 5. The exact relative efficiency (in percent) of the estimator selecte values of m, K an ρ y pj for m =8 m =6 m =0 m =3 ρ K K ρ K ρ K ρ

169 STATISTICS IN TRANSITION, March Table 6. The exact optimum relative efficiencies (in percent) of the propose class of estimators y for selecte values of m, K an ρ. a m =8 m =6 m =0 m =3 ρ K (0.96) (0.3704) (0.4444) (0.56) (0.6060) (0.587) (0.300) (0.3809) (0.500) (0.5674) (0.95) (0.36) (0.97) (0.4040) (0.45) (0.094) (0.865) (0.39) (0.30) (0.368) ρ K (0.63) (0.5063) (0.930) (0.803) (0.879) (0.769) (0.3493) (0.434) (0.5900) (0.6639) (0.60) (0.505) (0.36) (0.4309) (0.4887) (0.0968) (0. 930) (0. 406) (0.3344) (0.3805) ρ K (0.85) (0.5464) (0.6666) (0.877) (0.966) (0.8) (0.3584) (0.4444) (0.609) (0.6873) (0.74) (0.536) (0.358) (0.4376) (0.497) (0.0975) (0.944) (0.44) (0.3374) (0.3843) ρ K (0.375) (0.60) (0.769) (.000) (.347) (0.878) (0.379) (0.4638) (0.6404) (0.756) (0.96) (0.583) (0.3) (0.448) (0.50) (0.0984) (0.964) (0.45) (0.340) (0.3900)

170 98 A.K.Singh, L.N. Upahyaya, H.P. Singh: Comparisons of three REFERENCES CHAKRABORTY, R.P. (973): A note on small sample theory of the ratio estimator in certain specifie populations. Journal of Inian Society of Agricultural Statistics, 5,, CHAKRABORTY, R.P. (979): Some ratio-type estimators. Journal of Inian Society of Agricultural Statistics, 3,, DURBIN, J. (959): A note on the application of Quenouille s metho of bias reuction in estimation of ratios. Biometrika, 46, QUENOULLE, M.H. (956): Nots on bias in estimation. Biometrika, 43, RAO, J.N.K. an WEBSTER, J.T. (966): On two methos of bias reuction in estimation of ratios. Biometrika, 53, RAO, P.S.R.S. (98): Efficiencies of nine two phase ratio estimators for the mean. Journal of American Statistical Association, 76, 374 SINGH, A.K. an SINGH, H.P. (999): Efficiency of a class of unbiase ratiotype estimators uner a linear moel. Journal of Inian Society of Agricultural Statistics, 5(), 8 36.

171 TATISTICS IN TRANSITION, March 006 Vol. 7, No. 4, pp OBITUARY Mikołaj Latuch (93 005) Professor of economics, Dr. hab. Mikołaj Latuch left us forever on 3r October 005. His legacy comprises highly appreciate research stuies, acaemic textbooks, an numerous papers evote to emography, statistics, social policy, an migration. He remaine true to these scientific areas throughout all his professional life. Professor Mikołaj Latuch began his professional career within the acaemic environment of the former Central School of Planning an Statistics (recently Warsaw School of Economics SGH), which he grauate of in 954, an where he obtaine the Ph.D. an habilitation egrees. In 974 he was grante a title of the Associate Professor. He supervise several octoral issertations. He actively stuie emographic an migration phenomena an processes taking place in Polan an neighbour countries, particularly those that ha become the place of exile to many Poles. For many years Professor M. Latuch was the Director of the Institute of Social Economy in the former Central School of Planning an Statistics an was appointe the Hea of the Chair of the Socio-Economic Demography in the Warsaw School of Economic. He initiate there numerous empirical surveys an complete many immensely interesting an significant stuies. His iniviual or collective research elaborations, such as The Basics of Demography, Stuies on Contemporary International Migration an The Reasons for Repatriation

172 930 Obituary an Emigration of Population in Contemporary Polan eserve particular attention. Professor Mikołaj Latuch was a great social activist. He actively participate in numerous social organizations an scientific societies, particularly in the Polish Statistical Association an the Polish Demographic Society. In the years he hel a function of the Chairman of Demographic Sciences Committee of the Polish Acaemy of Science. For a couple of terms he was a member of the Government Population Council. He also participate in the activities of the Scientific Statistical Council functioning alongsie the Central Statistical Office. In 993, he was appointe the Man of the Year in Demography by the American Biography Institute. In 98, Professor Mikołaj Latuch together with a group of young acaemic workers of the Socio-Economic Chair of the former Central School of Planning an Statistics initiate the founation of scientific association of the Polish emographers. Since its very beginning the Professor actively participate in the works of the Main Boar of the Polish Demographic Society (PDS) holing various important functions for many terms. He was also the Chairman of the Warsaw Division of the PDS for many years. True to the Society ieas, he continue popularisation of the knowlege of the emographic processes, as well as the awareness of social, economic an cultural factors influencing these processes. Professor Mikołaj Latuch was immensely engage in the activities that resulte in the establishment of the social organization continuing the traition of the Polish Statistical Association (PSA) from the inter-war perio. Since April 98 he hel a function of the PSA Presient that he was appointe to by the Interim Main Council. In November 98, uring the First General Assembly of the PSA Delegates, Professor Latuch was electe the Presient of the Polish Statistical Society for the term of the years Professor M. Latuch initiate or co-organise many scientific conferences, seminars an symposiums organise by the PSA. He never cease in his efforts at aiing statisticians an emographers. He carrie out close co-operation with the statisticians of the Central Statistical Office for which he earne our sincere gratitue. The Professor was the author of many papers publishe inter alia in the Statistical journals, issue by the Central Statistical Office an the Polish Statistical Association. His most recent articles were mainly evote to spatial mobility of young Poles the issue that hel a particular place in the Professor s research work. Until the en of his ays the Professor was very active, evote to statistics, social an emographic problems, espite his severe illness that ha been increasingly stronger hinering carrying out his noble ieals an extensive research plans.

State of Louisiana Office of Information Technology. Change Management Plan

State of Louisiana Office of Information Technology. Change Management Plan State of Louisiana Office of Information Technology Change Management Plan Table of Contents Change Management Overview Change Management Plan Key Consierations Organizational Transition Stages Change

More information

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law Detecting Possibly Frauulent or Error-Prone Survey Data Using Benfor s Law Davi Swanson, Moon Jung Cho, John Eltinge U.S. Bureau of Labor Statistics 2 Massachusetts Ave., NE, Room 3650, Washington, DC

More information

A SPATIAL UNIT LEVEL MODEL FOR SMALL AREA ESTIMATION

A SPATIAL UNIT LEVEL MODEL FOR SMALL AREA ESTIMATION REVSTAT Statistical Journal Volume 9, Number 2, June 2011, 155 180 A SPATIAL UNIT LEVEL MODEL FOR SMALL AREA ESTIMATION Authors: Pero S. Coelho ISEGI Universiae Nova e Lisboa, Portugal Faculty of Economics,

More information

The one-year non-life insurance risk

The one-year non-life insurance risk The one-year non-life insurance risk Ohlsson, Esbjörn & Lauzeningks, Jan Abstract With few exceptions, the literature on non-life insurance reserve risk has been evote to the ultimo risk, the risk in the

More information

Modelling and Resolving Software Dependencies

Modelling and Resolving Software Dependencies June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software

More information

Enterprise Resource Planning

Enterprise Resource Planning Enterprise Resource Planning MPC 6 th Eition Chapter 1a McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserve. Enterprise Resource Planning A comprehensive software approach

More information

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Data Center Power System Reliability Beyond the 9 s: A Practical Approach Data Center Power System Reliability Beyon the 9 s: A Practical Approach Bill Brown, P.E., Square D Critical Power Competency Center. Abstract Reliability has always been the focus of mission-critical

More information

RUNESTONE, an International Student Collaboration Project

RUNESTONE, an International Student Collaboration Project RUNESTONE, an International Stuent Collaboration Project Mats Daniels 1, Marian Petre 2, Vicki Almstrum 3, Lars Asplun 1, Christina Björkman 1, Carl Erickson 4, Bruce Klein 4, an Mary Last 4 1 Department

More information

A Data Placement Strategy in Scientific Cloud Workflows

A Data Placement Strategy in Scientific Cloud Workflows A Data Placement Strategy in Scientific Clou Workflows Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Faculty of Information an Communication Technologies, Swinburne University of Technology Hawthorn, Melbourne,

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore

More information

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT OPTIMAL INSURANCE COVERAGE UNDER BONUS-MALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,

More information

10.2 Systems of Linear Equations: Matrices

10.2 Systems of Linear Equations: Matrices SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix

More information

How To Segmentate An Insurance Customer In An Insurance Business

How To Segmentate An Insurance Customer In An Insurance Business International Journal of Database Theory an Application, pp.25-36 http://x.oi.org/10.14257/ijta.2014.7.1.03 A Case Stuy of Applying SOM in Market Segmentation of Automobile Insurance Customers Vahi Golmah

More information

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations This page may be remove to conceal the ientities of the authors An intertemporal moel of the real exchange rate, stock market, an international ebt ynamics: policy simulations Saziye Gazioglu an W. Davi

More information

Chapter 9 AIRPORT SYSTEM PLANNING

Chapter 9 AIRPORT SYSTEM PLANNING Chapter 9 AIRPORT SYSTEM PLANNING. Photo creit Dorn McGrath, Jr Contents Page The Planning Process................................................... 189 Airport Master Planning..............................................

More information

A New Evaluation Measure for Information Retrieval Systems

A New Evaluation Measure for Information Retrieval Systems A New Evaluation Measure for Information Retrieval Systems Martin Mehlitz [email protected] Christian Bauckhage Deutsche Telekom Laboratories [email protected] Jérôme Kunegis [email protected]

More information

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions A Generalization of Sauer s Lemma to Classes of Large-Margin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom [email protected], WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/

More information

Professional Level Options Module, Paper P4(SGP)

Professional Level Options Module, Paper P4(SGP) Answers Professional Level Options Moule, Paper P4(SGP) Avance Financial Management (Singapore) December 2007 Answers Tutorial note: These moel answers are consierably longer an more etaile than woul be

More information

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines EUROGRAPHICS 2000 / M. Gross an F.R.A. Hopgoo Volume 19, (2000), Number 3 (Guest Eitors) Unsteay Flow Visualization by Animating Evenly-Space Bruno Jobar an Wilfri Lefer Université u Littoral Côte Opale,

More information

Improving Direct Marketing Profitability with Neural Networks

Improving Direct Marketing Profitability with Neural Networks Volume 9 o.5, September 011 Improving Direct Marketing Profitability with eural etworks Zaiyong Tang Salem State University Salem, MA 01970 ABSTRACT Data mining in irect marketing aims at ientifying the

More information

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters ThroughputScheuler: Learning to Scheule on Heterogeneous Haoop Clusters Shehar Gupta, Christian Fritz, Bob Price, Roger Hoover, an Johan e Kleer Palo Alto Research Center, Palo Alto, CA, USA {sgupta, cfritz,

More information

View Synthesis by Image Mapping and Interpolation

View Synthesis by Image Mapping and Interpolation View Synthesis by Image Mapping an Interpolation Farris J. Halim Jesse S. Jin, School of Computer Science & Engineering, University of New South Wales Syney, NSW 05, Australia Basser epartment of Computer

More information

Option Pricing for Inventory Management and Control

Option Pricing for Inventory Management and Control Option Pricing for Inventory Management an Control Bryant Angelos, McKay Heasley, an Jeffrey Humpherys Abstract We explore the use of option contracts as a means of managing an controlling inventories

More information

Achieving quality audio testing for mobile phones

Achieving quality audio testing for mobile phones Test & Measurement Achieving quality auio testing for mobile phones The auio capabilities of a cellular hanset provie the funamental interface between the user an the raio transceiver. Just as RF testing

More information

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection Towars a Framework for Enterprise Frameworks Comparison an Selection Saber Aballah Faculty of Computers an Information, Cairo University [email protected] Abstract A number of Enterprise Frameworks

More information

An introduction to the Red Cross Red Crescent s Learning platform and how to adopt it

An introduction to the Red Cross Red Crescent s Learning platform and how to adopt it An introuction to the Re Cross Re Crescent s Learning platform an how to aopt it www.ifrc.org Saving lives, changing mins. The International Feeration of Re Cross an Re Crescent Societies (IFRC) is the

More information

Performance And Analysis Of Risk Assessment Methodologies In Information Security

Performance And Analysis Of Risk Assessment Methodologies In Information Security International Journal of Computer Trens an Technology (IJCTT) volume 4 Issue 10 October 2013 Performance An Analysis Of Risk Assessment ologies In Information Security K.V.D.Kiran #1, Saikrishna Mukkamala

More information

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar Aalborg Universitet Heat-An-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar Publishe in: International Journal of Heat an Mass Transfer

More information

Using research evidence in mental health: user-rating and focus group study of clinicians preferences for a new clinical question-answering service

Using research evidence in mental health: user-rating and focus group study of clinicians preferences for a new clinical question-answering service DOI: 10.1111/j.1471-1842.2008.00833.x Using research evience in mental health: user-rating an focus group stuy of clinicians preferences for a new clinical question-answering service Elizabeth A. Barley*,

More information

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY Jörg Felhusen an Sivakumara K. Krishnamoorthy RWTH Aachen University, Chair an Insitute for Engineering

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS Contents 1. Moment generating functions 2. Sum of a ranom number of ranom variables 3. Transforms

More information

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION MAXIMUM-LIKELIHOOD ESTIMATION The General Theory of M-L Estimation In orer to erive an M-L estimator, we are boun to make an assumption about the functional form of the istribution which generates the

More information

Unbalanced Power Flow Analysis in a Micro Grid

Unbalanced Power Flow Analysis in a Micro Grid International Journal of Emerging Technology an Avance Engineering Unbalance Power Flow Analysis in a Micro Gri Thai Hau Vo 1, Mingyu Liao 2, Tianhui Liu 3, Anushree 4, Jayashri Ravishankar 5, Toan Phung

More information

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES 1 st Logistics International Conference Belgrae, Serbia 28-30 November 2013 INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES Goran N. Raoičić * University of Niš, Faculty of Mechanical

More information

Stock Market Value Prediction Using Neural Networks

Stock Market Value Prediction Using Neural Networks Stock Market Value Preiction Using Neural Networks Mahi Pakaman Naeini IT & Computer Engineering Department Islamic Aza University Paran Branch e-mail: [email protected] Hamireza Taremian Engineering

More information

Software Diversity for Information Security

Software Diversity for Information Security for Information Security Pei-yu Chen, Gaurav Kataria an Ramayya Krishnan,3 Heinz School, Tepper School an 3 Cylab Carnegie Mellon University Abstract: In this paper we analyze a software iversification-base

More information

EU Water Framework Directive vs. Integrated Water Resources Management: The Seven Mismatches

EU Water Framework Directive vs. Integrated Water Resources Management: The Seven Mismatches Water Resources Development, Vol. 20, No. 4, 565±575, December 2004 EU Water Framework Directive vs. Integrate Water Resources Management: The Seven Mismatches MUHAMMAD MIZANUR RAHAMAN, OLLI VARIS & TOMMI

More information

Firewall Design: Consistency, Completeness, and Compactness

Firewall Design: Consistency, Completeness, and Compactness C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,

More information

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market RATIO MATHEMATICA 25 (2013), 29 46 ISSN:1592-7415 Optimal Control Policy of a Prouction an Inventory System for multi-prouct in Segmente Market Kuleep Chauhary, Yogener Singh, P. C. Jha Department of Operational

More information

Sustainability Through the Market: Making Markets Work for Everyone q

Sustainability Through the Market: Making Markets Work for Everyone q www.corporate-env-strategy.com Sustainability an the Market Sustainability Through the Market: Making Markets Work for Everyone q Peter White Sustainable evelopment is about ensuring a better quality of

More information

Cross-Over Analysis Using T-Tests

Cross-Over Analysis Using T-Tests Chapter 35 Cross-Over Analysis Using -ests Introuction his proceure analyzes ata from a two-treatment, two-perio (x) cross-over esign. he response is assume to be a continuous ranom variable that follows

More information

Risk Management for Derivatives

Risk Management for Derivatives Risk Management or Derivatives he Greeks are coming the Greeks are coming! Managing risk is important to a large number o iniviuals an institutions he most unamental aspect o business is a process where

More information

! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6

! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6 ! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6 9 Quality signposting : the role of online information prescription in proviing patient information Liz Brewster & Barbara Sen Information School,

More information

Product Differentiation for Software-as-a-Service Providers

Product Differentiation for Software-as-a-Service Providers University of Augsburg Prof. Dr. Hans Ulrich Buhl Research Center Finance & Information Management Department of Information Systems Engineering & Financial Management Discussion Paper WI-99 Prouct Differentiation

More information

Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence

Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence Seeing the Unseen: Revealing Mobile Malware Hien Communications via Energy Consumption an Artificial Intelligence Luca Caviglione, Mauro Gaggero, Jean-François Lalane, Wojciech Mazurczyk, Marcin Urbanski

More information

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND art I. robobabilystic Moels Computer Moelling an New echnologies 27 Vol. No. 2-3 ransport an elecommunication Institute omonosova iga V-9 atvia MOEING OF WO AEGIE IN INVENOY CONO YEM WIH ANOM EA IME AN

More information

Cost Efficient Datacenter Selection for Cloud Services

Cost Efficient Datacenter Selection for Cloud Services Cost Efficient Datacenter Selection for Clou Services Hong u, Baochun Li henryxu, [email protected] Department of Electrical an Computer Engineering University of Toronto Abstract Many clou services

More information

Optimal Energy Commitments with Storage and Intermittent Supply

Optimal Energy Commitments with Storage and Intermittent Supply Submitte to Operations Research manuscript OPRE-2009-09-406 Optimal Energy Commitments with Storage an Intermittent Supply Jae Ho Kim Department of Electrical Engineering, Princeton University, Princeton,

More information

Mathematics Review for Economists

Mathematics Review for Economists Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school

More information

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis.

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis. The higher eucation factor: The role of higher eucation in the hiring an promotion practices in the fire service. By Nick Geis Spring 2012 A paper submitte to the faculty of The University of North Carolina

More information

Rural Development Tools: What Are They and Where Do You Use Them?

Rural Development Tools: What Are They and Where Do You Use Them? Faculty Paper Series Faculty Paper 00-09 June, 2000 Rural Development Tools: What Are They an Where Do You Use Them? By Dennis U. Fisher Professor an Extension Economist [email protected] Juith I. Stallmann

More information

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes Proceeings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Moeling an Preicting Popularity Dynamics via Reinforce Poisson Processes Huawei Shen 1, Dashun Wang 2, Chaoming Song 3, Albert-László

More information

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams Forecasting an Staffing Call Centers with Multiple Interepenent Uncertain Arrival Streams Han Ye Department of Statistics an Operations Research, University of North Carolina, Chapel Hill, NC 27599, [email protected]

More information

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014 Consumer Referrals Maria Arbatskaya an Hieo Konishi October 28, 2014 Abstract In many inustries, rms rewar their customers for making referrals. We analyze the optimal policy mix of price, avertising intensity,

More information

CALCULATION INSTRUCTIONS

CALCULATION INSTRUCTIONS Energy Saving Guarantee Contract ppenix 8 CLCULTION INSTRUCTIONS Calculation Instructions for the Determination of the Energy Costs aseline, the nnual mounts of Savings an the Remuneration 1 asics ll prices

More information

HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT

HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT International Journal of Avance Research in Computer Engineering & Technology (IJARCET) HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT Pawan Kumar, Pijush Kanti Dutta Pramanik Computer Science

More information

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5 Binomial Moel Hull, Chapter 11 + ections 17.1 an 17.2 Aitional reference: John Cox an Mark Rubinstein, Options Markets, Chapter 5 1. One-Perio Binomial Moel Creating synthetic options (replicating options)

More information

A New Pricing Model for Competitive Telecommunications Services Using Congestion Discounts

A New Pricing Model for Competitive Telecommunications Services Using Congestion Discounts A New Pricing Moel for Competitive Telecommunications Services Using Congestion Discounts N. Keon an G. Ananalingam Department of Systems Engineering University of Pennsylvania Philaelphia, PA 19104-6315

More information

USING SIMPLIFIED DISCRETE-EVENT SIMULATION MODELS FOR HEALTH CARE APPLICATIONS

USING SIMPLIFIED DISCRETE-EVENT SIMULATION MODELS FOR HEALTH CARE APPLICATIONS Proceeings of the 2011 Winter Simulation Conference S. Jain, R.R. Creasey, J. Himmelspach, K.P. White, an M. Fu, es. USING SIMPLIFIED DISCRETE-EVENT SIMULATION MODELS FOR HEALTH CARE APPLICATIONS Anthony

More information

Digital barrier option contract with exponential random time

Digital barrier option contract with exponential random time IMA Journal of Applie Mathematics Avance Access publishe June 9, IMA Journal of Applie Mathematics ) Page of 9 oi:.93/imamat/hxs3 Digital barrier option contract with exponential ranom time Doobae Jun

More information

Mandate-Based Health Reform and the Labor Market: Evidence from the Massachusetts Reform

Mandate-Based Health Reform and the Labor Market: Evidence from the Massachusetts Reform Manate-Base Health Reform an the Labor Market: Evience from the Massachusetts Reform Jonathan T. Kolsta Wharton School, University of Pennsylvania an NBER Amana E. Kowalski Department of Economics, Yale

More information

Supporting Adaptive Workflows in Advanced Application Environments

Supporting Adaptive Workflows in Advanced Application Environments Supporting aptive Workflows in vance pplication Environments Manfre Reichert, lemens Hensinger, Peter Daam Department Databases an Information Systems University of Ulm, D-89069 Ulm, Germany Email: {reichert,

More information

Manure Spreader Calibration

Manure Spreader Calibration Agronomy Facts 68 Manure Spreaer Calibration Manure spreaer calibration is an essential an valuable nutrient management tool for maximizing the efficient use of available manure nutrients. Planne manure

More information

Calibration of the broad band UV Radiometer

Calibration of the broad band UV Radiometer Calibration of the broa ban UV Raiometer Marian Morys an Daniel Berger Solar Light Co., Philaelphia, PA 19126 ABSTRACT Mounting concern about the ozone layer epletion an the potential ultraviolet exposure

More information

Math 230.01, Fall 2012: HW 1 Solutions

Math 230.01, Fall 2012: HW 1 Solutions Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The

More information

S&P Systematic Global Macro Index (S&P SGMI) Methodology

S&P Systematic Global Macro Index (S&P SGMI) Methodology S&P Systematic Global Macro Inex (S&P SGMI) Methoology May 2014 S&P Dow Jones Inices: Inex Methoology Table of Contents Introuction 3 Overview 3 Highlights 4 The S&P SGMI Methoology 4 Inex Family 5 Inex

More information

Lecture L25-3D Rigid Body Kinematics

Lecture L25-3D Rigid Body Kinematics J. Peraire, S. Winall 16.07 Dynamics Fall 2008 Version 2.0 Lecture L25-3D Rigi Boy Kinematics In this lecture, we consier the motion of a 3D rigi boy. We shall see that in the general three-imensional

More information

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany Parameterize Algorithms for -Hitting Set: the Weighte Case Henning Fernau Trierer Forschungsberichte; Trier: Technical Reports Informatik / Mathematik No. 08-6, July 2008 Univ. Trier, FB 4 Abteilung Informatik

More information

Gender Differences in Educational Attainment: The Case of University Students in England and Wales

Gender Differences in Educational Attainment: The Case of University Students in England and Wales Gener Differences in Eucational Attainment: The Case of University Stuents in Englan an Wales ROBERT MCNABB 1, SARMISTHA PAL 1, AND PETER SLOANE 2 ABSTRACT This paper examines the eterminants of gener

More information

DIFFRACTION AND INTERFERENCE

DIFFRACTION AND INTERFERENCE DIFFRACTION AND INTERFERENCE In this experiment you will emonstrate the wave nature of light by investigating how it bens aroun eges an how it interferes constructively an estructively. You will observe

More information

The Impact of Forecasting Methods on Bullwhip Effect in Supply Chain Management

The Impact of Forecasting Methods on Bullwhip Effect in Supply Chain Management The Imact of Forecasting Methos on Bullwhi Effect in Suly Chain Management HX Sun, YT Ren Deartment of Inustrial an Systems Engineering, National University of Singaore, Singaore Schoo of Mechanical an

More information

11 CHAPTER 11: FOOTINGS

11 CHAPTER 11: FOOTINGS CHAPTER ELEVEN FOOTINGS 1 11 CHAPTER 11: FOOTINGS 11.1 Introuction Footings are structural elements that transmit column or wall loas to the unerlying soil below the structure. Footings are esigne to transmit

More information

Wage Compression, Employment Restrictions, and Unemployment: The Case of Mauritius

Wage Compression, Employment Restrictions, and Unemployment: The Case of Mauritius WP/04/205 Wage Compression, Employment Restrictions, an Unemployment: The Case of Mauritius Nathan Porter 2004 International Monetary Fun WP/04/205 IMF Working Paper Finance Department Wage Compression,

More information

Measures of distance between samples: Euclidean

Measures of distance between samples: Euclidean 4- Chapter 4 Measures of istance between samples: Eucliean We will be talking a lot about istances in this book. The concept of istance between two samples or between two variables is funamental in multivariate

More information

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 12, June 2014

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 12, June 2014 ISSN: 77-754 ISO 900:008 Certifie International Journal of Engineering an Innovative echnology (IJEI) Volume, Issue, June 04 Manufacturing process with isruption uner Quaratic Deman for Deteriorating Inventory

More information

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems * GPRS performance estimation in GSM circuit switche serices an GPRS share resource systems * Shaoji i an Sen-Gusta Häggman Helsinki Uniersity of Technology, Institute of Raio ommunications, ommunications

More information

Dow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability

Dow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability www.corporate-env-strategy.com Sustainability Inex Dow Jones Sustainability Group Inex: A Global Benchmark for Corporate Sustainability Ivo Knoepfel Increasingly investors are iversifying their portfolios

More information

In 1975, there were 79 degree-granting creative-writing programs in North America.1

In 1975, there were 79 degree-granting creative-writing programs in North America.1 Harriett E. Green 217 Literature as a Network: Creative-Writing Scholarship in Literary Magazines Harriett E. Green abstract: With the increase in unergrauate an grauate programs for creative writing at

More information

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues Minimum-Energy Broacast in All-Wireless Networks: NP-Completeness an Distribution Issues Mario Čagal LCA-EPFL CH-05 Lausanne Switzerlan [email protected] Jean-Pierre Hubaux LCA-EPFL CH-05 Lausanne Switzerlan

More information

Aon Retiree Health Exchange

Aon Retiree Health Exchange 2014 2015 Meicare Insurance Guie Aon Retiree Health Exchange Recommene by Why You Nee More Coverage I alreay have coverage. Aren t Meicare Parts A an B enough? For many people, Meicare alone oes not provie

More information

DECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES

DECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES DECISION SUPPORT SYSTEM OR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES Svetlana Vinnik 1, Marc H. Scholl 2 Abstract Decision-making in the fiel of acaemic planning involves extensive analysis

More information

How To Evaluate Power Station Performance

How To Evaluate Power Station Performance Proceeings of the Worl Congress on Engineering an Computer Science 20 Vol II, October 9-2, 20, San Francisco, USA Performance Evaluation of Egbin Thermal Power Station, Nigeria I. Emovon, B. Kareem, an

More information

Hybrid Model Predictive Control Applied to Production-Inventory Systems

Hybrid Model Predictive Control Applied to Production-Inventory Systems Preprint of paper to appear in the 18th IFAC Worl Congress, August 28 - Sept. 2, 211, Milan, Italy Hybri Moel Preictive Control Applie to Prouction-Inventory Systems Naresh N. Nanola Daniel E. Rivera Control

More information

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements International Journal of nowlege an Systems Science, 3(), -7, January-March 0 A lame-ase Approach to Generating Proposals for Hanling Inconsistency in Software Requirements eian Mu, Peking University,

More information

Risk Adjustment for Poker Players

Risk Adjustment for Poker Players Risk Ajustment for Poker Players William Chin DePaul University, Chicago, Illinois Marc Ingenoso Conger Asset Management LLC, Chicago, Illinois September, 2006 Introuction In this article we consier risk

More information

A Comparison of Performance Measures for Online Algorithms

A Comparison of Performance Measures for Online Algorithms A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,

More information

A Theory of Exchange Rates and the Term Structure of Interest Rates

A Theory of Exchange Rates and the Term Structure of Interest Rates Review of Development Economics, 17(1), 74 87, 013 DOI:10.1111/roe.1016 A Theory of Exchange Rates an the Term Structure of Interest Rates Hyoung-Seok Lim an Masao Ogaki* Abstract This paper efines the

More information

An Introduction to Event-triggered and Self-triggered Control

An Introduction to Event-triggered and Self-triggered Control An Introuction to Event-triggere an Self-triggere Control W.P.M.H. Heemels K.H. Johansson P. Tabuaa Abstract Recent evelopments in computer an communication technologies have le to a new type of large-scale

More information