T1 Etmate SAT - 006 Tax and Lmoune Servce (TL) NAICS : 4853** by Javer Oyarzun BSMD Stattc Canada 007-1-1
1. Introducton 1.1 Ue of admntratve data Over the lat few year, Stattc Canada (STC) ha been accentuatng the ue of admntratve data n t urvey program. Admntratve data can be ued for varou purpoe: creaton and updatng of ample frame, drect tabulaton or analy (when admntratve data are ued ntead of a urvey), etmaton (e.g., admntratve data can be ued a auxlary varable n a regreon model), evaluaton (urvey data are compared wth admntratve data) and drect replacement of urvey data. There are everal ource of admntratve data. Good example nclude mport/export fle, regter of vtal tattc uch a brth and death, ncome tax and Employment Inurance fle, and admntratve record of educatonal, health care and jutce nttuton. One of man beneft of ung admntratve data cont n reducng the burden for urvey repondent. That why they are ued n the unncorporated bune etmate program, more commonly referred to a the T1 program. 1. Defnton of an unncorporated bune Every Canadan who earn ncome on whch tax mut be pad to the government requred to complete an ncome tax return (a T1 form) at the end of the year n whch the ncome wa earned. If profeonal, bune, common, rental, farmng or fhng ncome reported on the T1 form for a gven year, the ndvdual condered an unncorporated bune. If a bune ha employee and ha to make alary deducton, or f a bune ha to collect good and ervce tax (GST) becaue t make more than $30 000 per year, t mut have a Bune Number (BN) ued by the Canada Revenue Agency (CRA). Otherwe, the bune unquely dentfed only by the ndvdual ocal nurance number (SIN). 1.3 Informaton provded by the CRA Each year, STC receve two major fle from the CRA. One of them, the Aeed Record Fle (ARF), contan nformaton about all ndvdual who reported an amount other than zero for at leat one of the x type of ncome lted n ecton 1.. The man varable n the fle are gro ncome and net ncome for the x ncome ource. In 006, there were jut over 3.6 mllon unncorporated bunee n Canada. The other fle, known a the E-Fle, a ubet of the ARF. It contan the nformaton for all unncorporated bunee that ubmtted ther data to the CRA electroncally. About 50 varable are avalable n the fle. In 006, about 56% of repondent reported ther data electroncally. 1.4 The T1 etmate program 1.4.1 Etmate for the UES The T1 etmate program ha two component. One component related to the Unfed Enterpre Survey (UES). For th component, the am of the T1 etmate program to provde etmate of total for about 30 varable n a number of dfferent doman for about 40 urvey. The UES baed on everal prncple. The UES ue common generc proceng ytem and method and an ntegrated quetonnare wth harmonzed varable and concept, and t upported by the Bune Regter (BR), a central databae created and mantaned at STC. To be ncluded n the BR, a T1 mut meet one of the followng condton: make alary deducton, or have a GST account. In 006, jut over 75 000 T1 atfed one or both of the condton. Unt that are ncluded n one or more of the 40 urvey are dentfed, and a SIN aocated wth each one. Unt for whch there no SIN are excluded from the target populaton. In general, we manage to fnd a SIN for about 95% of the unt. The etmate produced for th component are baed on the BR and hence do not provde a complete pcture of all T1 n Canada.
1.4. Satellte etmate The econd component referred to a atellte etmate. They are produced for three urvey: Tax and Lmoune Servce Indutry, Courer and Local Meenger Indutry, and Real Etate Rental and Leang and Property Management. A large proporton of the T1 for thee ndutre have nether employee nor GST account and therefore are not ncluded n the BR. A a reult, the etmate produced by the UES do not take thee unt nto account even though they make up a gnfcant part of each ndutry total revenue. To remedy th problem, we ue the ARF (ee ecton 1.3) to defne the target populaton and not the BR. Then, ung the E-Fle (electronc taxfler), we create tattcal model to etmate the proporton for whch nformaton unavalable (paper taxfler). The reultng etmate are known a atellte etmate. Snce 005, atellte etmate have alo been generated for all NAICS code. Thoe etmate are ued by Indutry Account Dvon (IAD), Servce Indutre Dvon and Tranportaton Dvon. 1.4.3 Comparablty of UES etmate and atellte etmate For a gven NAICS code, the etmate produced for the UES and the atellte etmate are never comparable. The man reaon that the target populaton defned completely dfferently n the two etmaton ytem. In general, we expect that atellte etmate wll be hgher than UES etmate. Yet, becaue the etmaton method are dfferent and becaue the ARF and the BR can have dfferent NAICS code for the ame unt, t poble that for certan varable etmated for mall doman, the atellte etmate wll be lower than the UES etmate. 1.4.4. BR undercoverage for T1 The bune regter (BR) a tructured lt of bunee that produce good and ervce n Canada. The lt ued by tattcal program to determne the n-cope populaton, to elect ample, upport collecton actvte, montor repondent burden and upport/perform bune demographc analy. The BR undercover the T1 populaton, becaue t contan only enterpre that have at leat one employee or/and that have at leat $30 000 n GST ale. In other word, the BR mng the unt that have no employee and a GST ale of under $30 000. Fgure 1.4.4. : T1 coverage porton (the dark grey the porton not covered by the BR) Royce-Maranda threhold BR TX TN UNITS not covered by the BR ARF/EFILE On the ARF fle approxmately 69 % of the unt are under 30 000 $ SUPP Entrepre : < 30 000 $ and wthout employee
. Proceng the ARF and the E-Fle.1 Imputng NAICS code An mportant varable n the ARF the NAICS code,.e., the ndutry code for the T1 bune a reported by the taxfler. When STC receve the ARF, about 30% of the NAICS code are mng or ncorrect (the majorty of them are coded 000000). Snce etmate are requred for all NAICS6 x PROVINCE combnaton, a vald NAICS code needed for each unt to avod underetmaton. If the code mng or ncorrect, t wll be mputed. The mputaton proce cont of two tep. Frt, an attempt made to fnd a vald NAICS code (at leat two dgt) wth nformaton from other ource, and then the thrd, fourth, ffth and xth dgt are mputed probabltcally f neceary. The proce decrbed below. (a) E-Fle It often the cae that an ndvdual ha more than one unncorporated bune n more than one ndutry. If o, the ndvdual may fle a many return a he or he ha bunee. Each one wll appear a a eparate entry n the E-Fle, and a vald NAICS code wll generally be aocated wth each return. When that occur, the NAICS code n the ARF 000000. In that cae, the ARF NAICS code mputed wth the NAICS code of the tax return wth the hghet total gro ncome. At th pont, the mputed NAICS code may have two, three, four, fve or x dgt. Jut over 50% of the mng NAICS code are mputed n th manner. (b) Prevou year ARF When a vald NAICS code cannot be found n the E-Fle, the ARF for the prevou year ued. If no vald NAICS code found, the ARF from two year earler ued. Th method reult n the mputaton of nearly % of the mng NAICS code (two, three, four, fve or x dgt). (c) BR Step 3 to ue the BR. Wth a SIN-BN concordance fle generated by Tax Data Dvon (TDD), a BN can be aocated wth a number of unt n the ARF. Wth the BN, the ARF can be matched wth the BR to obtan a NAICS code for a number of unt. About 7% of the ARF unt are mputed wth a NAICS code from the BR. (d) Prmary actvty Indrectly, the type of ncome reported n the ARF may provde a good ndcaton of the bune ndutry. For example, more than 94% of T1 n the ARF whoe man ource of ncome rent have a NAICS6 code of 531111. It therefore reaonable to aume that 94% of the unt n the ARF that have a mng NAICS6 and have reported rental ncome a ther prmary actvty have a NAICS6 code of 531111. The mputaton method ued nvolve agnng a random number between 0.0000 and 100.0000 to the unt n the ARF whoe man ource of ncome rent and whoe NAICS code unknown. We alo generate a cumulatve frequency table (ee below) howng the dtrbuton of NAICS code for unt that reported rent a ther man ource of ncome.
Table.1: Dtrbuton of unt whoe man ource of ncome rent NAICS Frequency Percentage Cumulatve frequency Cumulatve percentage 110000 404 0.047 404 0.047 530000 4 0.0005 1 536.5188 531000 7 0.0008 1 543.5196 531100 13 0.0015 1 556.511 531110 4 0.0005 1 560.516 531111 804 880 94.1359 86 440 96.6575 913130 3 0.0004 855 019 100.000 For example, f a random number between.5188 and.5196 choen for a gven unt whoe NAICS code ha to be mputed, the mputed NAICS code wll be 531000. If the random number between.516 and 96.6575, the mputed NAICS code wll be 531111. Thu t eay to ee that the dtrbuton of the mputed NAICS code wll be the ame a that of the non-mputed NAICS code. The proce repeated for the other fve type of ncome (farmng, fhng, common, bune, profeonal). Imputaton reult Table.1..: Reult of tep a, b, c and d (cae where the NAICS code wa mng or nvald n the ARF) Method Frequency Percent E-Fle 535 636 5.5 % ARF 005 1 161.07 % ARF 004 4 60 0,45 % BR 45 60 4.44 % Prmary act. 413 140 40.51 % TOTAL 1 019 817 100 % (e) NAICS3 to NAICS6 At th tage, all record have at leat a vald two-dgt NAICS code. We now have to make ure that the thrd, fourth, ffth and xth dgt are preent and vald; f not, they wll be mputed. We tart wth the thrd dgt. We proceed a n tep d; that, we agn a random number to all record wth a mng thrd dgt. We alo generate a cumulatve frequency table howng the dtrbuton of the NAICS3 for each NAICS. Th enure that the mputed NAICS3 wll have the ame dtrbuton a the non-mputed NAICS3. The operaton repeated for the NAICS4, NAICS5 and NAICS6. It mportant to note that nformaton about the mputaton reult retaned o that we can alway fnd out whether a NAICS code wa mputed, what ource wa ued, and whch dgt were mputed.
Imputaton reult Table.1.3.: Reult of tep e (cae where dgt 3, 4, 5 or 6 are mng followng tep a, b, c and d) Dgt mputed Cae where a NAICS wa ntally nvald n the ARF Cae where a NAICS wa ntally mng or nvald n the ARF Dgt 3, 4, 5 and 6 were mputed 143 435 67 839 Dgt 4, 5 and 6 were mputed 58 500 171 651 Dgt 5 and 6 were mputed 91 09 38 543 Dgt 6 wa mputed 115 635 58 56 TOTAL 608 599 336 89 Note: The record n the lat four column of Table.a are alo n the lat column of Table.b above. In 006, there were 3 685 191 record n the ARF. For 1 050 04 of thoe record (8.5%), the entre NAICS code wa mputed; for 143 435 record (3.9%), only the thrd to xth dgt were mputed; for 58 500 record (7.0 %), only the fourth to xth dgt were mputed; for 91 09 record (.5%), only the ffth and xth dgt were mputed; and for 115 635 record (3.1%), only the xth dgt wa mputed. In all, at leat one dgt of the NAICS code wa mputed for 1 658 803record (45%).. Identfyng partnerhp A partnerhp ext when two or more ndvdual are partner n the ame unncorporated bune. For tax purpoe, each partner mut report the partnerhp fgure and not the fgure for h or her hare of the partnerhp. For example, f a couple own a dwellng that brng n $5 000 a year n rental ncome, each partner mut report $5 000 n gro ncome on h or her tax return. Each partner hare of the bune reflected only n h or her net ncome. To avod overetmatng the revenue and expendture of unncorporated bunee, t mportant to accurately dentfy uch partnerhp. However, not all of them can be dentfed wth the nformaton avalable n the CRA fle. A mple method wa ued to detect them. Frt, the NAICS code, regon and total gro ncome varable were compared; f two or more ndvdual reported the ame value for all three varable, they were deemed to be member of the ame partnerhp. Smlarly, f two or more peron reported the ame value for the NAICS code, regon and gro ncome from prmary actvty (a prmary actvty farmng, fhng, rental, bune, common or profeonal dentfed for each NAICS code), t deemed to be a partnerhp. To avod detectng fale partnerhp, we conder only thoe record wth a gro ncome of more than $10 000. Below that threhold, there a hgh rk of fndng partnerhp that are not really partnerhp. The partnerhp dentfcaton proce currently under revew. A better method that ue more nformaton expected to be ntroduced wthn a few month..3 Detectng outler Outler are value that are erroneou or ncontent wth other data. A good example of an outler $999 999 999, whch occur n about 50 record n the total gro ncome feld n the ARF. Outler are excluded from the etmate and are not mputed. There are varou method of detectng them. In the cae of the atellte etmate, a value deemed to be an outler f t greater than $5 000 000. A more robut outler detecton ytem n development. Stude wll be carred out to dentfy the bet method for ue n ubequent year.
3. Etmate Etmate are produced for about 60 varable (for a lt of etmated varable, ee Appendx 1). Varable begnnng wth the letter L are avalable drectly n E-Fle. Varable begnnng wth C or D are not avalable drectly n E-Fle but are a combnaton of varable begnnng wth L. The etmate for the varable begnnng wth L were derved wth tattcal model. The model are contructed on the ba of electronc taxfler (for whom all varable are avalable) and then appled to paper taxfler. The model were contructed at the NAICS6 level. That mean that the parameter of the varou model were derved for every poble NAICS6 groupng (NAICS6 code begnnng wth 31, 3, 33 and 91 were modelled at the NAICS level becaue there were too few record for a number of NAICS6 code). No model wa bult for NAICS6 code wth fewer than three electronc taxfler. For thoe NAICS6, t wa poble to derve only varable L899 and L9946 drectly wth the ARF. Two dfferent model were ued n the etmaton proce: mple lnear regreon etmaton, and rato etmaton. The detal are provded below. (a) Varable L899 and L9946 The followng model wa elected for th varable: y o + β1 β x + ε where β 0, β1 andε are the uual regreon model parameter and x the total gro ncome varable for L899 (for L9946, x the total net ncome varable) whch avalable for all unt n the ARF. We want to etmate ung the predctve approach. Let Y be an etmator of Y. Thu we have wth follow: y β o + β1 Y Y y U y + y U x. An etmator of the varance of Y under the lnear regreon model derved a V ( ) ( N n) ( x xu ) Y 1+ f σ ( 1 f ) ( x x ) ε n where
n y y ) ( ) ( σ and N n f. (b) Other varable begnnng wth L A rato model wa preferred for thee varable. In mathematcal form, the model can be wrtten a follow: x y ε β + 1 where 1 β and ε are the uual rato model parameter and We want to etmate x U y Y ung the predctve approach. Let Y be an etmator of Y. Thu we have + U y y Y wth x y 1 β. An etmator of the varance of Y under the rato model derved a follow: u n y y n f x x N Y V ) 1 ( ) ( 1 ) ( σ where n y y ) ( ) ( σ. Another way of obtanng Y the followng equaton: U x y x Y
(c) Varable begnnng wth C or D Varable begnnng wth C or D are mply derved ung varable begnnng wth L (ee Appendx ). 4. Réult In th ecton we ll preent the reult obtaned ung the atellte etmate for the Tax and Lmoune Servce urvey (TL: 4853**) : 4.1. Number of T1 unt Year Number of unt 006 38 943 005 37 090 004 36 778 003 36 06 00 34 18 4.. NAICS mputaton Year Number of unt (wthout mputaton) Number of unt (wth mputaton) Imputaton E-FILE Imputaton Htorcal ARF Imputaton BR Imputaton Actvty 006 7 791 (71,36 %) 005 5 945 (69,95%) 004 4 944 (67,8 %) 003 6 539 (73,30 %) 00 4 80 (7,67 %) 38 943 4 436 (39,78 %) 37 090 4 11 (36,90 %) 36 778 3 765 (31,8 %) 36 06 48 (5,1 %) 34 18 0 (1,68 %) 161 (1,44 %) 109 (0,98 %) 1515 (1,80 %) 07 (,83 %) 567 (7,5 %) 739 (6,63 %) 176 (11,45 %) 85 (7,0 %) 445 (4,60 %) 41 (,58%) 5 816 (5,15 %) 5 648 (50,68 %) 5 70 (48.18 %) 4 587 (47,45 %) 4 496 (48,1 %) 4.3. Proporton of E-FILER Year E-FILER P-FILER Number of unt 006 17 819 (45,76 %) 005 16 007 (43,16 %) 004 14 347 (39,01 %) 003 13 160 (36,35 %) 00 10 99 (3,1 %) 1 14 (54,4 %) 1 083 (56,84 %) 431 (60,99 %) 3 046 (63,65 %) 3 136 (67,79 %) 38 943 37 090 36 778 36 06 34 18
4.4. Proporton of partnerhp Year Wthn a partnerhp Not n a partnerhp Number of unt 006 10 304 (6,46%) 005 9 771 (6,34%) 004 9 643 (6,%) 003 9 486 (6,0%) 00 8 93 (6,17%) 8 639 (73,54%) 38 943 7 319 (73,66%) 37 090 7 135 (73,78%) 36 778 6 70 (73,80%) 36 06 5 196 (73,83%) 34 18 4.5. Outler Year Number of unt Outler Number of unt (ARF) 006 38 943 157 (0,40 %) 005 37 090 87 (0,3 %) 004 36 778 101 (0,7 %) 003 36 06 448 (1, %) 00 34 18 364 (1,06 %) 39 100 37 177 36 879 36 654 34 49 4.6. Reult for revenue and expene Year Number of unt C098 Revenue C4699 Expene 006 38 943 1 133 495 18 913 439 773 005 37 090 1 096 044 003 909 01 59 004 36 778 1 073 743 058 873 63 411 003 36 06 1 004 43 36 818 978 343 00 34 18 939 595 636 748 597 435
4.7. BR undercoverage 4.7.1. UES and SAT etmate Year Number of unt (ARF) Number of unt (BR-SCF) Dfférence of unt Proporton of undercovered unt 006 39 100 1 64 6 476 67,71% 005 37 177 1 4 4 755 66,59% 004 36 879 11 480 5 399 68,87% 003 36 654 10 64 6 390 7,00% 00 34 49 NA NA NA Year C098 Revenue BR-UES C098 Revenue SAT Dfférence of revenue Proporton of undercovered revenue 006 547 64 417 1 133 495 18 585 870 765 51,69% 005 475 837 113 1 096 044 003 60 06 890 56,59% 004 386 410 306 1 073 743 058 687 33 75 64,01% 003 55 301 419 1 004 43 36 45 11 943 45,01% 00 NA 939 595 636 NA NA 4.7.. Unt under 30 000$ (BR undercoverage) Year Unt reportng a revenue below 30 000 $ 006 1 766 (55,89 %) 005 1 686 (58,47 %) 004 301 (60,64 %) 003 186 (61,8 %) 00 1 640 (63,41 %) Unt reportng a revenue above 30 000 $ Number of unt 17 177 (44,11 %) 38 943 15 404 (41,53 %) 37 090 14 477 (39,36 %) 36 778 14 00 (38,7 %) 36 06 1 488 (36,59 %) 34 18
Appendx 1 : Lt of etmated varable. Nom de la varable Défnton L4007 Clong Inventore, farm L4008 Openng Inventore, farm L8000 Net Sale L8141 Real Etate Rental Income (tranferred to L8000) L830 Other Income L890 Reerve deducted lat year L899 Total revenue L8300 Openng nventory L830 Purchae L8340 Drect wage L8360 Sub-contract L8450 Other cot L8500 Clong nventory L8518 Cot of good old L8519 Gro Proft L851 Advertng L853 Meal and entertanment L8590 Bad Debt expene L8690 Inurance L8710 Interet L8760 Bu. Tax, Fee, Lcene etc. L8810 Offce expene L8811 Supple L8860 Legal, accountng and other profeonal fee L8871 Management and admntraton fee L8910 Rent L8960 Mantenance and repar L8963 Boat Repar L9060 Salare, wage and beneft L906 Crew hare L9136 Gear L9137 Net and Trap L9138 Bat, ce alt L9180 Property taxe L900 Travel L90 Telephone and utlte L94 Fuel cot L970 Other expene L975 Delvery, freght and expre L981 Motor vehcle expene L9368 Total bune expene L9369 Net ncome (lo) before adjutment L993 Land addton L994 Land dpoton L995 Detal of equpment addton L996 Detal of equpment dpoton L997 Detal of buldng addton L998 Detal of buldng dpoton L9931 Total bune lablty L993 Drawng
L9933 L9935 L9936 L9943 L9945 L9946 L9947 L9948 L9949 L9950 C3040S C450 C4599 C4699 D9857 D9858 D9859 D9876 DV_FU Captal contrbuton Allowance on elgble captal property Captal cot allowance Other amount deducted from hare Bune ue of home expene Net ncome/lo Recaptured CCA Termnal lo Total peronal porton of expene Fler hare amount L8340 + L9060 total alare, wage and beneft L9935 + L9936 deprecaton and amortzaton L899 - L9946 - L8710 total operatng expene L899 - L9946 total expene L9946 + L8710 Operatng Proft L8340 + L9060 + L9935 + L9936 + L9946 + L8710 value added L899 - (L8340 + L9060 + L9935 + L9936 + L9946 + 8710)ntermedate nput L8710 Non-operatng expene ( nteret) L90+L94total fuel and utlte expene
Appendx : Mappng gven by Tranportaton Dvon. Varable Mappng C077 L830+L8141 + L996 + L998 C080 L8000+L890+L830+L8141 + L996 + L998 C098 L8000+L890+L830+L8141 + L996 + L998 C99 L8000+L890 C3041 L8340+L9060+L906 C3088 L8360 C3399 L8810 + L8811 + L90 + L9136 +L9137 + L9138 + (L8300-L8500) + (L995*0.5) + L9945 +(L830*.40) C4066 L94 C4069 (L981*.65) C4101S L8810 + L8811 + L90 + L9136 +L9137 + L9138 + (L8300-L8500)+ (L995*0.5) + L9945 +(L830*.40) + L94 C4115 L8910 + (L981*.0) C4140S L8910 + (L981*.0) C4178 L8960 + L8963 + (L981*0.15)+ L997 C40S L8960 + L8963 + (L981*0.15)+ L997 C4070 L975 C4370S L900 + L975 + L8360 + L8690 + L851 + L8860 + L853 + L8871 + (L830*.0) C4410 L8760 + L9180 C450 L9935 + L9936 + (L995*0.5) C4569 L970 + L8590 + L9943 + L9945 + L8450 + L993 + L994 +(L830*.40) D9875 L970 + L8590 + L9943 + L9945 + L8450 + L993 + L994 + (L830*.40) C4599 (L8340+L9060+L906)+(L981*.65) + (L8810 + L8811 + L90 + L9136 +L9137 + L9138 + (L8300-L8500)+ (L995*0.5) + L9945 + (L830*.40) + L94)+ (L8910 + (L981*.0)) + (L8960 + L8963 + (L981*0.15)) + L997 + (L900 + L975 + L8360 + L8690 + L851 + L8860 + L853 + L8871 + (L830*.0)) + (L8760 + L9180) + (L9935 + L9936 + (L995*0.5)) + (L970 + L8590 + L9943 + L9945 + L8450 + L993 + L994 +(L830*.40) ) C4630 L8710 D9876 L8710 C4699 (L8340+L9060+L906)+(L981*.65) + (L8810 + L8811 + L90 + L9136 +L9137 + L9138 + (L8300-L8500) + (L995*0.5) + L9945 + (L830*.40) + L94)+(L8910 + (L981*.0)) + (L8960 + L8963 + (L981*0.15)+ L997) + (L900 + L975 + L8360 + L8690 + L851 + L8860 + L853 + L8871 + (L830*.0)) + (L8760 + L9180) + (L9935 + L9936 + (L995*0.5)) + (L970 + L8590 + L9943 + L9945 + L8450+ L993 + L994 +(L830*.40) ) + (L8710); D980 (L8000+L890+L830+L8141 + L996 + L998) - ((L8340+L9060+L906)+(L981*.65) + (L8810 + L8811 + L90 + L9136 +L9137 + L9138 + (L8300-L8500)+ (L995*0.5) + L9945 + (L830*.40) + L94)+ (L8910 + (L981*.0)) + (L8960 + L8963 + (L981*0.15)+ L997) + (L900 + L975 + L8360 + L8690 + L851 + L8860 + L853 + L8871 + (L830*.0)) + (L8760 + L9180) + (L9935 + L9936 + (L995*0.5)) + (L970 + L8590 + L9943 + L9945 + L8450 + L993 + L994 +(L830*.40) ))