Detecting Possibly Fraudulent or ErrorProne Survey Data Using Benford s Law


1 Detecting Possibly Frauulent or ErrorProne Survey Data Using Benfor s Law Davi Swanson, Moon Jung Cho, John Eltinge U.S. Bureau of Labor Statistics 2 Massachusetts Ave., NE, Room 3650, Washington, DC Key Wors: Consumer Expeniture Interview Survey; Curbstoning; Digit preference; Pearson test statistic; Quantitative survey responses; Reinterview. Any opinions expresse in this paper are those of the authors, an o not constitute policy of the Bureau of Labor Statistics. 1. Introuction The Consumer Expeniture Survey (CE) is a nationwie househol survey conucte by the U.S. Bureau of Labor Statistics (BLS) to fin out how Americans spen their money. As with any survey, the accuracy of CE s publishe expeniture estimates epens on the accuracy of the collecte ata. The CE survey has several proceures alreay in place to ensure the accuracy of its publishe expeniture estimates. They inclue reinterviews of some responents, computerize checks for the logical consistency of responses given by responents, an outlier review of iniviual survey responses, an another outlier review of the summarize expeniture estimates before they are publishe (BLS Hanbook of Methos). In this paper we escribe another metho of ientifying inaccurate survey ata. The metho is littleknown, but it has been rapily gaining popularity over the past ecae. The metho involves examining the istribution of the leaing (or leftmost) igits of all the numbers reporte on a survey form. These leaing igits have been observe to follow a certain istribution regarless of the nature of the survey. This phenomenon is calle Benfor s Law. By knowing the istribution of the leaing igits, one can ientify unusual ata which may be frauulent or generate by an errorprone process by ientifying the interviews in which the istribution of leaing igits oes not follow the expecte istribution. In this paper we will escribe the Consumer Expeniture Survey an the current methos use in that survey to ientify inaccurate ata. Then we will escribe Benfor s Law, escribe some applications of it in other settings, an then we will give an example showing how Benfor s Law can be use to ientify unusual ata in a survey setting using CE ata as an example. 2. Backgroun The Consumer Expeniture Survey is a nationwie househol survey conucte by the BLS to fin out how Americans spen their money. Data for the survey are collecte by the Bureau of the Census uner contract with the BLS. One of the primary uses of the ata is to provie expeniture weights for the Consumer Price Inex. Data are collecte by personal visits to the househols in the survey s sample. The Consumer Expeniture Survey consists of two separate surveys, the Diary (CED) an Quarterly Interview (CEQ) surveys. The purpose of the CED is to obtain etaile expeniture ata on small, frequently purchase items such as foo an apparel. The purpose of the CEQ is to obtain etaile expeniture ata on large items such as property, automobiles, an maor appliances; an on expenses that occur on a regular basis such as rent, utility bills, an insurance premiums. Approximately 3,500 househols are visite each quarter of the year in the CED, an 15,000 househols in the CEQ. The CED uses a new sample of househols each quarter of the year. Each househol in the CED is aske to keep a recor of all its expenitures mae uring a 2week perio. After participating in the survey for 2 weeks the househol is roppe from the survey, an it is replace by another househol. The CEQ is a panel rotation survey. Each panel is interviewe for five consecutive quarters, an then roppe from the survey. As one panel leaves the survey, a new panel is introuce. Approximately 20 percent of the aresses are new to the survey each quarter.
2 3. Current Methos of Ientifying Problematic Data in the CE Survey The CEQ an CED surveys currently have several methos of ientifying incorrect ata. The first metho is a reinterview process in which a fiel representative s supervisor calls a small number of responents who participate in the survey on the telephone to fin out whether the responent was actually visite by the fiel representative, an to verify the accuracy of a few of their responses. Some responents are ranomly selecte, while others are selecte because the supervisor is suspicious of the ata s accuracy. The reinterview process is mainly intene to catch curbstoners, fiel representatives who make up the ata without ever visiting or contacting the responent. After the reinterview process, all of the remaining methos of checking the ata are intene to ientify legitimate ata that were incorrectly recore or keye. The methos inclue a computerize check for logical consistency of the responses, an outlier analysis for iniviual reporte observations, an another outlier analysis on the summarize expeniture estimates before they are publishe. An example of a logical consistency error is when a box is checke off inicating that no expenitures were mae in a certain item category, but yet there is an expeniture reporte anyway. Logical consistency errors are easy for a computer to fin. The outlier review process for iniviual reporte expenitures involves ientifying observations that are unusually large, an then investigating them to fin out whether they are accurate or seem reasonable. Photocopies of the complete survey forms are store on microfilm, an an examination of the survey forms sometimes reveals keying errors, such as a misplace ecimal point changing a reporte expeniture from $2.99 to $ CE s outlier analysis focuses on large expenitures rather than small expenitures because large outliers have a much larger impact on the final publishe expeniture estimates. CE uses four methos of ientifying outliers: The largest gap test. The mean expeniture is calculate for each ollar fiel within each item coe. The expenitures above the mean are sorte in escening orer, an the ifference (or gap) between each expeniture an the one below it is calculate. The largest of these gaps is ientifie, an all expenitures above it are flagge for review. If the reporte expeniture is the largest value within its area/item combination it is flagge for review. If the reporte expeniture is greater than 25% of the total of all expenitures within its area/item combination (50% is use instea of 25% if the number of expenitures is below 10) it is flagge for review. If the reporte expeniture is greater than 20 times the meian reporte expeniture within its area/item combination it is flagge for review. Every observation flagge as an outlier by one or more of these tests is printe on an outlier review listing. To help reviewers focus on the more extreme outliers, scores are given to each outlier, with the score basically reflecting the number of tests that consiere it to be an outlier. 4. Other Methos of Detecting Incorrect Data Reinterviews an outlier reviews are the most common methos of ientifying incorrect or falsifie survey ata, but other methos of etecting them have also been propose. For example, Biemer an Stokes (1989) report that in 1982 the Census Bureau starte collecting information on the interviewers it caught cheating in orer to evelop a profile of the people an situations in which cheating was foun. One of the Census Bureau s finings was that most cheating occurre with new interviewers who worke for the Census Bureau for less than one year. Biemer an Stokes use this information to evelop a moel for improving the etection of interviewer cheating. Another metho is to compare the survey results obtaine by ifferent interviewers. Turner et. al. (2000) presente a case stuy in which falsifie survey ata were etecte in an epiemiologic survey when one of the interviewers was observe to have an unusually high interview yiel. Most interviewers were successful obtaining interviews from about 30% of the sample househols, while one interviewer ha a success rate of 85%. A review of the interviewer s results along with numerous reinterviews showe that much of the ata were falsifie. Further examinations of the ata turne up more interviewers with falsifie ata. When their ata were examine it was observe that not only were their response rates higher than normal, but the fabricate ata were ifferent as well. For example, interviewers whose ata were believe to be accurate showe 50% of all househols in the survey s sample having one eligible ault, while interviewers whose ata were believe to be fabricate showe almost 70% of the househols having one eligible ault. As a result of their experience with this survey Turner et. al. avocate examining the incoming ata on a aily
3 basis in orer to catch clues of potential ata falsification as soon as possible. 5. Benfor s Law Another metho of ientifying incorrect ata that has receive a lot of attention in recent years is calle Benfor s Law. The metho is name for Frank Benfor, an American physicist who publishe a paper in 1938 escribing a curious property that large collections of real worl numbers ten to have: the leaing (or leftmost) igit of the numbers is more likely to be small rather than large. Specifically, he foun that the proportion of real worl numbers whose leaing igit is =1,2,3,,9 is approximately + 1 log 10. This phenomenon is calle Benfor s Law. Hill (1995) publishe a paper with the first rigorous mathematical explanation of why the leaing igits in many ata sets follow Benfor s Law. Hill offere several explanations. One of his explanations involve a type of central limit theorem in which several probability istributions are chosen at ranom from a large collection of probability istributions. Then several ranom variables are chosen from each of the selecte istributions. Uner these conitions Hill prove that the leaing igits of the numbers follow Benfor s Law. This can be written mathematically as: + 1 P{ x = } = log10 where x is the leaing igit of a ranomlyselecte number. For example, in the CE survey responents report their expenitures on a large number of item categories, with each item category having a ifferent istribution of expenitures. Then within each item category the responents report several expenitures. Thus we have several ifferent probability istributions, an several ranom variables are chosen from each istribution, so the conitions escribe by Hill are satisfie. As a consequence the leaing igits of all the expenitures reporte on the CE s survey forms shoul follow Benfor s Law. 6. Applications of Benfor s Law in Other Areas Moern applications of Benfor s Law began in 1992, when Mark Nigrini examine the istribution of leaing igits he foun in some sales an expense ata for his octoral thesis. The ata he examine followe Benfor s Law quite closely. Then after that initial success, Nigrini continue to use Benfor s Law to examine other business an financial ata. For example, he use it to examine the expense claims of a nationwie chain of motels, where he uncovere approximately one million ollars of frauulent claims. Then in 1996 Nigrini examine IRS tax return ata an foun that the leaing igits of the line items Interest Pai an Interest Receive followe Benfor s Law. His tax return stuy was publishe in the Journal of the American Taxation Association. Next Nigrini examine the leaing igits of the numbers containe in Presient Clinton s tax returns for the years Nigrini foun that the leaing igits followe Benfor s Law, so he conclue that Presient Clinton s tax returns were honest. These an other stuies conucte by Nigrini generate a lot of interest within the accounting inustry, an toay the accounting inustry is the largest business sector using Benfor s Law to etect frauulent ata. Nigrini s work also le to tax agencies in several countries aroun the worl as well as several U.S. states, incluing California, using Benfor s Law to etect frauulent ata on tax returns. Finally, the scientific community is occasionally rocke by stuies that turn out to contain falsifie ata, an Benfor s Law is starting to be use there to etect such falsifie ata. 7. Leaing Digit Patterns in CE Data The table below shows expeniture ata collecte by the CEQ survey in the year The survey collecte ata on 734,684 expenitures. By looking at the table it can be seen that the leaing igits of those expenitures follow Benfor s Law quite closely. Accoring to CEQ ata, 30.5% of the leaing igits were 1 s, while Benfor s Law preicte the percentage to be 30.1%. The percentage of leaing igits equal to 2 was 19.3% in the CEQ ata, while Benfor s Law preicte the percentage to be 17.6%. Table 1. Comparison of CEQ Data with Benfor s Law Leaing Reporte Expenitures Benfor s Law Digit () Number Percent (SE) + 1 log % 1 223, (.063) , (.053) , (.045) , (.040) , (.044) , (.034) , (.029) , (.028) , (.021) 4.6 Total 734,
4 Although the CEQ ata follow Benfor s Law quite closely for some igits, a etaile examination of the ata reveals a slight excess of 2 s an 5 s, an a slight shortage of 9 s in the ata. In the CEQ ata 19.3% of the leaing igits were 2 s, while Benfor s Law preicte the percentage to be 17.6%. Likewise, 10.4% of CEQ s leaing igits were 5 s, while Benfor s Law preicte it to be 7.9%. This slight excess of 2 s an 5 s is usually attribute to responents rouning their expenitures to numbers such as $25 or $50, but it might also represent frauulent (or curbstone) ata in which fiel representatives create ata that tene to start with 2 s an 5 s. The low percentage of 9 s is curious because their shortage cannot be attribute to rouning numbers either up or own. The CEQ ata have fewer 8 s than Benfor s Law preicts, so the expenitures are probably not roune own, an there are not enough 1 s to account for rouning up (the 1 s excee Benfor s preiction by 0.4 percentage points, but there is a 2.1 percentage point shortage of 9 s). The stanar errors in Table 1 are equal to the square root of the average of 100 ranomgroup variance estimates, where each ranomgroup variance estimate is base on ranomly partitioning the set of sample consumer units into 50 groups. The point estimates an variance estimates repeate in Table 1 are unweighte. Table 2 compares the unweighte estimates with the weighte estimates. The stanar errors in the last column of Table 2 are obtaine by the balance repeate replication metho with 44 replicate weights. Table 2. Comparison of Unweighte an weighte Ratio Leaing Digit () Number Percent (SE) unweighte Percent (SE) weighte 1 223, (.063) 30.4 (.072) 2 141, (.053) 19.3 (.081) 3 90, (.045) 12.3 (.057) 4 66, (.040) 9.0 (.047) 5 76, (.044) 10.5 (.056) 6 50, (.034) 6.8 (.046) 7 35, (.029) 4.8 (.035) 8 32, (.028) 4.4 (.041) 9 18, (.021) 2.5 (.033) Total 734, SE The ratio of these two stanar errors, ( ) 2 BRR SERG, can be viewe as a eff. The range of ratios of the effs is from 1.32 to Note that the calculation of these effs are base on stanar errors with more igits than the reporte ones. When the interclass correlation coefficient is 0, we 2 can erive eff = 1+ CV where CV is the coefficient of variation of weights. This erivation is from the formula which Kish propose to etermine the esign effect in orer to incorporate the effects ue to both weighting an clustere selection. Gabler et. al. ustifie the formula. Figure 1 isplays a quantilequantile plot of Fisher Z, where the stanarize Fisher Z is efine as follows: p i p Z ˆ ˆ ˆ = ( n 3) tanh 1 ( ˆ ρ ) tanh 1 qˆ iqˆ where ρ is the correlation coefficient between the weighte proportions of leaing igit i an for i. We compute ρ from a covariance obtaine by the balance repeate replication metho with 44 replicate weights. Since we have 40 egrees of freeom in our example ata, n equals to 41. Note that i is the consistent estimator of qˆ i qˆ correlation of i an for i uner the multinomial moel. Therefore this ifference in the secon term shoul converge to 0 if the multinomial moel is satisfie. On the other han, the fact that the absolute value of Ẑ is large means that that the absolute value of ρ is large, i.e., i an are highly correlate in our example ata. Figure 1 suggests that the principal eviation from normality is observe in the thir though thirtythir values of being smaller than the associate quantile of the stanar normal istribution. One shoul be cautious not to overinterpret this result because the values are not inepenent. However, this quantilequantile plot is consistent with a mixture moel in which some interviewers have a igit reporting profile that iffers from those of the other interviewers. This suggests that further investigation of negative correlations associate with mixtures of multinomial istributions woul be of interest. Zhat Figure Stanar Normal Quanti l e
5 8. Ientifying the Source of Unusual Data Benfor s Law can be use to help ientify sources of unusual ata. For example, suppose a fiel representative is suspecte of curbstoning. Many stuies (e.g., Browne, 1998) have shown that people ten to be ba at fabricating realistic ata, so one way of ientifying curbstoners is to see whether their ata follow Benfor s Law. If it oes, then they are probably collecting accurate ata. If it oes not, then they may be fabricating at least some of the ata. If the ata from each fiel representative is viewe as arising from a simple ranom sample, then Pearson s chisquare test statistic may be helpful in etermining whether a fiel representative s collecte ata follow Benfor s Law: 9 2 ( p ) θ = n = 1 p where n = the number of expenitures reporte by a particular fiel representative, = the proportion of those expenitures whose leaing igit is, an + 1 p = log 10. This statistic is a goonessoffit measure that has a chisquare istribution with 9 1=8 egrees of freeom. An alternative test statistic is the same formula, but where p is the proportion of all numbers collecte in the survey whose leaing igit is. This alternative efinition of p is compute from the complete universe of ata collecte from all fiel representatives. It takes into consieration the fact that Benfor s Law may not hol exactly for a particular ata set. It also assumes that the vast maority of fiel representatives are honest, so that the estimate value of p using the complete universe of collecte ata from all fiel representatives is close to the true value of p. This is sometime calle a igital analysis. Table 3 shows an example of CEQ ata from a typical fiel representative (θ = 10.39) an from an unusual fiel representative (θ = ) using CE s complete set of collecte ata to estimate p : The ata in Table 3 show that the unusual fiel representative has a large number of 5 s an 6 s. When 1,132 expenitures are reporte, the percentage of leaing igits equal to 5 shoul be approximately 10.4% ± 1.8%, but 17.2% of that fiel representative s leaing igits are 5 s. Likewise, the percentage of leaing igits equal to 6 shoul be approximately 6.8% ± 1.5%, but 10.5% of that fiel representative s leaing igits are 6 s. These unusual results suggest that the fiel representative may have fabricate some of the ata. These confience intervals are compute as ± 2 SE. p Table 3. An Example of Data from Typical an Unusual Fiel Representatives Leaing Digit () CEQ s Nationwie Distribution (n=734,684) A Typical FR (θ = 10.39) (n=1,143) An Unusual FR (θ = ) (n=1,132) Total The chisquare istribution with 9 1=8 egrees of freeom has a mean of 8.0 an a stanar eviation of 4.0, hence only 1 out of every 1,000,000 fiel representatives shoul have a test statistic greater than However, an examination of the CEQ ata reveals 5 fiel representatives with test statistics greater than 42.7, an 1 fiel representative with a test statistic greater than This is strong evience that some of the fiel representatives ata o not follow the expecte istribution. Their ata are suspicious, an those fiel representatives shoul be investigate to etermine whether they are curbstoning. 9. Conclusion Benfor s Law is a simple an powerful tool that can be use to help ientify possibly frauulent or errorprone survey ata in many settings, incluing sample surveys. It is important to ientify incorrect survey ata because the accuracy of any survey s results epens on the accuracy of the collecte ata. Although Benfor s Law was first iscovere 120 years ago, it has been rapily gaining popularity over the past ecae. Its newfoun popularity is mostly in the accounting an auiting inustries, but there is great potential for its use in the fiel of sample surveys as well. In fact, the universality with which it applies to nearly every real worl ata set is one of its more curious an powerful aspects. Although Benfor s Law can be a powerful tool in ientifying falsifie survey ata, we hasten to point out that it really only ientifies unusual ata. As with any statistical or quality control tool, after the unusual ata have been ientifie they must be
6 examine to etermine whether or not they are accurate. Benfor s Law is a potentially powerful tool that can be ae to other quality control tools use in the worl of surveys to increase the accuracy of the ata. 10. References Benfor, Frank, The Law of Anomalous Numbers, Proceeings of the American Philosophical Society, vol. 78, pp , Biemer, Paul P., an Stokes, S. Lynne, The Optimal Design of Quality Control Samples to Detect Interviewer Cheating, Journal of Official Statistics, vol. 1, pp , Browne, Malcolm W. (1998). Following Benfor s Law, or Looking Out for No. 1. New York Times, August 4, Bureau of Labor Statistics (1997). BLS Hanbook of Methos. U.S. Department of Labor. Gabler, S., Haeer, S an Lahiri, P (1999). A Moel Base Justification of Kish s Formula for Design Effects for Weighting an Clustering, Survey Methoology, vol. 25, pp , Hill, Theoore (1995). BaseInvariance Implies Benfor s Law. Proceeings of the American Mathematical Society, vol. 123, pp Hill, Theoore (1995). The SignificantDigit Phenomenon. American Mathematical Monthly, vol. 102, pp Hill, Theoore (1995). A Statistical Derivation of the SignificantDigit Law. Statistical Science, vol. 10, pp Hill, Theoore (1996). The FirstDigit Phenomenon. American Scientists, vol. 86, pp Hill, Theoore (1997). Benfor s Law. Encyclopeia of Mathematics Supplement, vol. 1, p Nigrini, Mark (1996). A Taxpayer Compliance Application of Benfor s Law. Journal of the American Taxation Association, vol. 18, pp Turner, C.F., J.N. Gribble, A.A. AlTayyib, J.R. Chromy (2000). Falsification in Epiemiologic Surveys: Detection an Remeiation. Technical Papers on Health an Behavior Measurement, No. 53. Washington, DC: Research Triangle Institute.
More information