Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos. ISS 250-7988 Volume 5 (202) 3 pp. 59-66 MIR Labs, www.mrlabs.et/jcsm/dex.html ear eghbor Dstrbuto Sets of Fractal ature Marcel Ja Isttute of Computer Scece AS CR Dept. of olear Modelg Pod Vodáreskou Vží 2 82 07 Prague, Czech Republc e-mal: marcel@cs.cas.cz Abstract: Dstaces of several earest eghbors of a gve pot a multdmesoal space play a mportat role some tasks of data mg. Here we aalyze these dstaces as radom varables defed to be fuctos of a gve pot ad ts k-th earest eghbor. We prove that f there s a costat such that the mea k-th eghbor dstace to ths costat power s proportoal to the ear eghbor dex k the ts dstace to ths costat power coverges to the Erlag dstrbuto of order k. We also show that costat s the scalg expoet kow from the theory of multfractals. Keywords: earest eghbor, fractal set, multfractal, Erlag dstrbuto I. Itroducto There are dstct problems dealg wth the earest eghbor or several earest eghbors to a gve pot data mg methods ad procedures. A rather strage behavor of the earest eghbors hgh dmesoal spaces was studed from dfferet pots of vew. Oe of them s a oparametrc techue of probablty desty estmato multdmesoal data space [], [2]. Classfcato to two or more classes ad the probablty desty estmate usg several earest eghbors s a typcal task [], [2], [5], [6]. Also clusterg s a problem where terpot dstaces are commo use. Ths class of problems looks for the hghest ualty of the probablty desty estmato, whle effcecy ad speed are secodary. The other ssue s the problem of earest eghbor searchg. Ths task s terestg ad mportat database applcatos [7], [8], [9]. A typcal task of searchg large databases s searchg for other earest eghbor ueres. For ths class of problems, maxmal performace,.e. the speed of earest eghbor searchg, s a prmary task. The use of the k-earest eghbor approach for other applcatos was studed, also e.g. by [0], ad []. For the problem of searchg the earest eghbor large databases, the boudary pheomeo was studed [8]. It was show by the use of L max metrc why the actual performaces of the earest eghbor searchg algorthms ted to be much better tha ther theoretcal aalyses would suggest. The cause s just the boudary effect a hgh dmesoal space. I [9] t was foud that as dmesoalty creases, the dstace to the earest data pot approaches the dstace to the farthest data pot of the learg set. For probablty desty estmato by the k-earest -eghbor method, the best value of k must be carefully tued to fd optmal results. The value of k s also depedet o the total umber of samples of the learg (or trag) data set. Let a pot x (a uery pot [7] ) be gve ad let there be a ball wth ts ceter x ad cotag k pots of the learg set t. Let the volume of ths ball be V k. The, for the probablty desty estmate pot x t holds [] k / pk ( x). () V A smple expermet wth desty estmato at the ceter of a multdmesoal cube wth uformly dstrbuted pots wll show that startg from some k the value of p k (x) for larger k s ot costat, as t should be, but lesses. The boudary effect surprsgly flueces the k-th earest eghbor about whch everybody would say that ths eghbor s stll very ear to pot x. I both cases, a statstcs of terpot dstaces s of prmary terest. Oe ofte-used tool s the Rpley K-fucto [2] usually wrtte the form: K( - E(umber of further pots wth dstace r of a arbtrary pot), where s the testy of the process or the expected umber of evets per ut of area (assumed costat). The K-fucto has bee studed from dfferet pots of vew vast lterature. I some works, the pot process s lmted to two dmesos ad K-fucto as well [3], [4]. Marco ad Puech [4] summarze mportat features of the K-fucto cludg dfferet correctos of edge (boudary) effects. They summarze mportat features of the K-fucto cludg dfferet correctos of edge (boudary) effects. A edge effect meas that pots located close to the doma borders are problematc because a part of the crcle sde k MIR Labs, USA
60 Marcel Jra whch pots are supposed to be couted s outsde the doma. Result of gorg ths effect s uderestmatg K. D. Evas [5] studes a class of radom varables defed to be fuctos of a gve pot ad ts earest eghbors. If the sample pots are depedet ad detcally dstrbuted, the assocated radom varables wll also be detcally dstrbuted, but ot depedet. He shows that radom varables of ths type satsfy a strog law of large umbers, the sese that ther sample meas coverge to ther expected values almost surely as the umber of sample pots. Moreover, Evas troduced a terestg lemma that for every coutable set of pots R d, ay pot ca be the (frst) earest eghbor of at most t(dv dp ) other pots of the set, where V dp s the volume of the ut ball R d wth L p orm. Dealg wth K-fuctos, t was already Rpley [2] who poted out that the K-fucto shares some of the propertes of the terpot dstrbuto fucto, eve though t s ot a dstrbuto fucto because K( as r. Later o, Boett ad Pagao [6] troduced emprcal cumulatve dstrbuto fucto F (d) of dstaces d for total samples. Ths s, fact, a correlato tegral [7] oe of ts may forms. Boett ad Pagao the gave a proof that the scaled dstrbuto of F (d) computed at a fte set of values, d, d 2,... coverges to a multvarate ormal dstrbuto as. The same result was gve by a dfferet way already by Slverma [8]. The goal of ths study s to aalyze the dstaces of the earest eghbors from gve pot x ad the dstaces betwee two of these eghbors, the -th ad (-)-st the space of radomly dstrbuted pots. We show that the k-th earest eghbor dstace from a fxed pot (the uery pot) s dstrbuted accordg to modfed Erlag dstrbuto of order k. The modfcato depeds o that oe uses a varable that s eual to the dstace to the proper expoet power stead of the dstace aloe as a depedet varable. I ths paper we frst pot out some features of eghbor dstaces a multdmesoal space. Secod, we remd of the probablty dstrbuto mappg fucto, ad the dstrbuto desty mappg fucto. These fuctos map probablty dstrbuto of pots R to a smlar dstrbuto the space of dstaces, whch s oe-dmesoal,.e. R. Thrd, fluece of boudary effects o the probablty dstrbuto mappg fucto s show, ad the power approxmato of the probablty dstrbuto mappg fucto the form of (dstace) s troduced. We show that expoet s, fact, the scalg expoet kow from the theory of multfractals [9], [20], [2]. o-uform as well as uform dstrbutos are cosdered. I cocluso, we ca say that the earest eghbor space does ot look so strage as show [7], [8] whe we look at t from the pot of vew of a olear scale measured by sutable power of eghbors dstaces. II. ear eghbors Dstrbuto Problem The earest-eghbor-based methods usually use () for a probablty desty estmate ad are based o the dstaces of eghbors from a gve pot. Usg the eghbor dstaces or terpot dstaces for the probablty desty estmato should copy the features of the probablty desty fucto based o real data. The dea of most ear-eghbors-based methods as well as kerel methods [2] assumes a reasoable statstcal dstrbuto the eghborhood of the pot uesto. That meas that for ay pot x (the uery pot [7] ), the statstcal dstrbuto of the data pots x surroudg t s supposed to be depedet of the locato of the eghbor pots ad ther dstaces x from pot x. Ths assumpto s ofte ot met, especally for small data sets ad hgher dmesos. To llustrate ths, let us cosder Eucldea metrcs ad uformly dstrbuted pots cube (-0.5, + 0.5). Let there be a ball wth ts ceter the org ad the radus eual to 0.5. 3 Ths ball occupes 4 3π.0. 5 0.524,.e. more tha 52 % of that cube a three-dmesoal space, 0.080746,.e. 8 % of ut cube 6-dmesoal space, 0.0026 0-dmesoal space, ad 3.28e -2 40-dmesoal space. It s the see that startg by some dmeso, say 5 or 6 ad some dex, the -th earest eghbor does ot le such a ball aroud pot x but somewhere the corer of that cube but outsde ths ball. Drawg a larger ball, we ca see that some parts of t are empty, especally parts ear to ts surface. I farther places from the org, the space thus seems to be less dese tha ear the org. It follows that ths -th eghbor les farther from pot x as would follow from the supposed uformty of dstrbuto. The fucto f ( ) r, where r s the mea dstace of the -th eghbor from pot x, should grow learly wth dex the case of uform dstrbuto wthout the boudary effect metoed. I the other case, ths fucto grows faster tha learly. There s some ukow dstrbuto of pots the eghborhood of pot x. If ths dstrbuto s ot uform, we would lke to have fucto f() uform a sese that f() s proportoal to umber. The best choce would be f ( ) for uform dstrbuto ad o boudary effects. r I real cases of hgher dmesos, boudary effects occur every tme. ot to eglect these effects, let us choose a fucto f ( ), where s a sutable power,. A r sutable choce of wll be dscussed later. III. Fractal systems Oe of the most mportat elemets of the chaos theory are sgularty expoets (also called scalg expoets). They are used multfractal chaotc seres aalyss. We try here to use these expoets a formula for probablty dstrbuto of ear eghbors of a gve posto x. Ths task usually has othg to do wth tme seres but as show already by Madelbrot 982 [9] ay data may possess a fractal or multfractal ature. It ca be foud that the chaos theory provdes some useful elemets ad tools that could be utlzed for estmatg the probablty metoed, ad coseuetly to use them, e.g. for classfcato tasks or for tasks of earest eghbors searchg. The chaos theory s focused o chaotc processes that are descrbed by tme seres. Therefore, the order of values of varables plays sgfcat role. There s a lot of practcal tasks descrbed by data that do ot form a seres. I spte of that some elemets of the chaos theory could be used for
ear eghbor Dstrbuto sets of Fractal ature 6 processg of data of ths kd [27]. Such data are, for example, the well-kow rs data o three speces of rs flower gve by Fsher [33]. Each dvdual sample descrbes oe partcular flower but ether flowers or data about them form seres. There s a set of flowers as well as a set of data wthout ay orderg. The task s to state to what speces a flower belogs accordg to measured data. Also oe ca defe a dstace or dssmlarty defed o the parameter space that descrbes dvdual flowers as pots. ow oe ca defe a dstace betwee two flowers ad also for a partcular flower to fd the earest eghbor,.e. the most smlar oe, the secod earest,.e. the secod most smlar flower ad so o. Fractal systems are kow to exhbt a fractoal power fucto depedece betwee a varable a called a scale (of a resoluto uatty, or of a measure that ca be assocated wth a ouform dstrbuto wth a support defed for the fractal set) ad freuecy s or of probablty P of ts appearace, s h ( a) a or P h ( a) a. Here h s called a fractal dmeso of a fractal system. A multfractal system s a geeralzato of a fractal system whch a sgle expoet h (the fractal dmeso) s ot eough to descrbe ts behavor; stead, a cotuous spectrum of expoets (the so-called sgularty spectrum or Hausdorff spectrum) s eeded. I a multfractal system, a local power law descrbes the behavor aroud ay pot x s( x a s x a h( x) + ) ( ). (2) The expoet h(x ) s called the sgularty (or holde expoet or sgularty stregth [20], as t descrbes the local degree of sgularty or regularty aroud pot x, ad a s called a scale of a multresoluto uatty or of a measure. The esemble formed by all the pots that share the same sgularty expoet s called the sgularty mafold of expoet h, ad s a fractal set of fractal dmeso D(h). The curve D(h) vs. h s called the sgularty spectrum or Hausdorff spectrum ad fully descrbes the (statstcal) dstrbuto of varable a. The sgularty spectrum of a moofractal cossts of a sgle pot. For multfractal objects, oe usually observes a global h power law scalg of the form (2) or smply P( a) a. That s a local power law that descrbes the behavor aroud ay pot x at least some rage of scales ad for some rage of h. Whe such a behavor s observed, oe talks of scale varace, self-smlarty or multscalg. I a slghtly dfferet perspectve, fractal appears as (umerably may) pots a multdmesoal space of dmeso. I practce, the fractal set s gve, sampled, by fte collecto of M samples (patters, measuremets etc.) that form solated pots -dmesoal space. I case of depedet samples there s o orderg or seuece of samples, thus there s o trajectory. Samples ca oly be more smlar or dssmlar some sese. If the -dmesoal space s a metrc space, the most commo dssmlarty measure s a dstace. For a collecto of samples of the fractal set the mutual dstaces betwee samples (pots) are dfferet realzatos of a radom varable called dstace a betwee samples. Ths radom varable has support (0, ). The dstace a betwee samples has dstrbuto fucto F(a) ad correspodg probablty desty P(a). Thus, we have troduced a fractal set, the support o t ad multfractal measure P(a). Ideed, havg a par o. of two samples lyg dstace a oe from the other, the P(a ) s a probablty of appearace a.,.e. of dstace a at place. Apparetly, the way of eumeratg ca be arbtrary. To state a emprcal dstrbuto F(a ) ad desty P(a ) let us eumerate (sort) all dstaces a,, 2,.. M, M ( ) so that a < a +. The dstrbuto fucto 2 F(a ) s the a starcase fucto of steps, ad F(a ) /M. A emprcal dervatve of F(a) at a s P(a ) ad ca be approxmated as P(a ) 0, P(a ) /(a - a - ). More sophstcated approxmatos of P(a ) are possble. After that oe ca use formulas for Hausdorff dmeso f() ad sgularty stregth () above ad get the sgularty spectrum. IV. Aalyss A. Pot process of eghbors Let us assume radom dstrbuto of pots x,, 2,, bouded rego W of -dmesoal space R wth L p metrcs. We cosder appearace of eghbors aroud a gve locato (uery pot) x as a pot process P R [34], [35]. Throughout ths part, we say that pot x s sde S the followg sese: For each eghbor y cosdered, the ball wth ts ceter at x ad radus eual to x-y les sde S. Ths s the case where the boudary effects do ot take place. B. Oe-dmesoal case I ths partcular case, we are terested homogeous Posso process oly. Postos of pots o R + from locato u 0 are characterzed by dstaces from pot 0; the k-th eghbor of pot 0 appears at dstace r k. We use smple aalogy betwee tme ad dstace here. I the oe-dmesoal homogeous Posso process wth testy λ the ter-arrval tmes have a expoetal dstrbuto [34]. The the dstace betwee two eghbor pots s a radom varable wth expoetal dstrbuto fucto λ P( ) e ad probablty desty fucto λ p ( ) e [34], [35]. For ths dstrbuto fucto the mea s E{} / d ad t s the mea dstace betwee two eghbor pots. Let us mage a postve half-le wth a uery pot at pot 0 ad wth radomly ad uformly spread pots,.e. the dstace betwee two eghbor pots s wth mea d. The uesto s: What s the dstrbuto of the dstace of each pot from pot 0? The dstace of -th pot (-th eghbor of pot 0) s smply the sum of dvdual dstaces betwee two successve pots. These dvdual dstaces are depedet ad have the same expoetal dstrbuto wth /d. Because the dstace s the sum of all successve dstaces, t s also the sum of depedet expoetally dstrbuted radom varables. Ths problem was studed coecto wth mass servce (ueug) systems ad represets the ssue of depedet expoetal servers workg seres
62 Marcel Jra (cascade) [22]. The servers have the same expoetal dstrbuto of the servce tme wth costat. Geerally, the resultg total servce tme after the -th server s gve by gamma dstrbuto wth teger frst parameter or the Erlag dstrbuto Erl(, ) [22] [23], [24] p C. Multdmesoal case! x e λx ( x) λ. Let us have a spatal pot process P R. Pot process P cosdered, as a set of pots ca be a fractal set. It s possble to defe several kds of dstaces betwee pots of P [34].. Oe ca troduce a dstace betwee two pots of P, R j dst(x ; x j ), x, xj P. I a bouded rego W R a cumulatve dstrbuto fucto of R j CI ( lm h( r lj ) (3) ( ) j s deoted as correlato tegral. h(.) s the Heavsde step fucto. Grassberger ad Procacca [7] have troduced correlato dmeso ν as lmt C I ( ν lm (4) r 0 r 2. Let Rdst(x, P) be the shortest dstace from the gve locato u to the earest pot of P. Ths s called the cotact dstace ad the cumulatve dstrbuto fucto of Rdst(x, P) a bouded rego W R s called cotact dstrbuto fucto or the empty space fucto [35]. We ca exted dstaces from a gve locato u to the secod, thrd,... earest eghbor pot of P ad fd the correspodg dstrbuto fucto of reachg k-th earest eghbor. Ths fucto s called a survvor fucto [36]. 3. For dstaces from a gve (ad fxed) locato u to other pots of P we use oto of a dstrbuto mappg fucto (DMF) troduced our prevous works [25], [26]. Defto The dstrbuto desty mappg fucto d(x, of the eghborhood of the uery pot x s fucto d( x, D( x,, r where D(x, s a probablty dstrbuto mappg fucto of the uery pot x ad radus r. I bouded rego W whe usg a proper ormalzato the DMF s, fact, a cumulatve dstrbuto fucto of dstaces from a gve locato u to all other pots of P W. We call t also the ear eghbors dstrbuto fucto of pot u. Here we use some deftos troduced our prevous works [25], [26]. Defto The probablty dstrbuto mappg fucto D(x, of the eghborhood of the uery pot x s fucto D ( x, p( z) dz, B( x, r ) where r s the dstace from the uery pot ad B(x, s the ball wth ceter x ad radus r. ote. It s see that for fxed x the fucto D(x,, r > 0 s mootoously growg from zero to oe. Fuctos D(x, ad d(x, for fxed x are the (cumulatve) probablty dstrbuto fucto ad the probablty desty fucto of the ear eghbor dstace r from pot x, respectvely. We use D(x, ad d(x, mostly ths sese. Defto Let a, b be dstaces of two pots from a uery pot x R. The d a b. ( ) We dfferetate betwee the d () ad dstace d a - b ; we wrte also d () (a, b). D. Scalg Defto Let there be a postve such that D ( x, r ) cost for r 0 +. r We call fucto d () r a power approxmato of the probablty dstrbuto mappg fucto. Ths defto aturally follows a prevous defto. The mportat thg s that the last defto exactly gves a true pcture of the scalg property kow from the fractal ad multfractal systems theory, where s kow as multfractal dmeso, scalg (sgularty or Hölde expoet or sgularty stregth [9], [20], [27], [30]. If P s ofractal, the scalg expoet has teger value for all pots of P. Especally f P s a homogeous Posso process, the [36]. If P s a moofractal, scalg expoet has the same value for all pots of P,.e. a local scalg expoet s eual to correlato dmeso ν. I the other case, P s a multfractal characterzed by local scalg expoet (u) that depeds o partcular posto of locato (uery pot) u uesto. Moreover, there s a well-kow correlato tegral form (3). The correlato tegral ca be rewrtte the form:
ear eghbor Dstrbuto sets of Fractal ature 63 ad also CI ( lm h( r l ), j j I lm D( x, C (. Thus, the correlato tegral s a mea of probablty dstrbuto mappg fuctos for all pots of a set cosdered. The estmated value of the correlato dmeso computed usg well-kow log-log plot [7] depeds heavly o the umber of data pots cosdered. Ths appears especally hgh dmesoal spaces. It has bee dscussed to a large extet [29], [30], [3]. Krakovska [3] gave estmate of the error correlato dmeso estmate. Her estmate evaluates the fluece of the edge (boudary) pheomeo for data a cube. The cube was cosdered as the worst case as data usually form a more rouded cloud the space. At the same tme, she geerated uformly dstrbuted data a cube so that each coordate was a radom varable wth uform dstrbuto o (0, ). Thus, pot process that geerated pots multdmesoal hypercube ths way was ot a multdmesoal Posso process. Multvarate data geerated by the multdmesoal Posso process have fractal dmeso eual to the embeddg space dmesoalty. Data geerated by the method descrbed above the apparetly have lower fractal dmeso ad t ca be easly see that ths s ot caused by lack of data pots or by edge effects (that ca be elmated by the use of the approach descrbed Chap. IV.A). We show t greater detal Chap. VI. V. Results A. Uform-lke dstrbuto ad scalg expoet Let a uery pot x be surrouded by other pots uformly dstrbuted to some dstace d 0 from pot x. It meas that there s a ball B(x, d 0 ) wth pots uformly dstrbuted sde t. I ths case, the umber of pots a ball eghborhood wth ceter at the uery pot x grows wth the -th power of dstace from the uery pot (up to dstace d 0 ). The dstrbuto mappg fucto D(x, also grows wth the -th power of dstace from the uery pot x,.e. the umber of pots ball wth ceter at x ad radus r grows learly wth r. ad D ) ( x, ( r / d for r <0, d 0 0 >, D(x, for r/d 0 >. At the same tme, D(x,, as a fucto of z r, D(x, z), grows learly, too. The dstrbuto desty mappg fucto d(x, z), take as D( x, r ), s a costat for r (0, d 0 ) ad ( r ) zero otherwse. Let r be a dstace of the -th eghbor of pot x, ad z r. It follows that the mea d (), d ( ) E( rk rk ) (pot k- 0 s pot x) of successve eghbor pots s a costat uder the codto of uform dstrbuto. Measurg dstace dmesos by (dstace),.e. by the use of d () we get, fact, the same pcture as oe-dmesoal case, dscussed Sec. IV.B. Because the dstrbuto of d () (0,, r <0, d 0 > s uform, the d () of successve eghbors, d () (r k-, r k ) s a radom varable wth expoetal dstrbuto fucto. It also follows that the d () of the -th earest eghbor from the uery pot s gve by the sum of d () s betwee the successve eghbors. The, t s a radom varable wth the Erlag dstrbuto Erl(, ), / d, where () d s mea d () () betwee the successve eghbors. Defto Let there be ball B(x, wth ceter x ad radus r. Let there be a postve costat. We say that pots ball B(x, are spread uformly wth respect to d (), f d () (r k-, r k ) of two successve eghbors s a radom varable wth expoetal dstrbuto fucto wth parameter. ote that the wordg... are spread uformly wth respect to d () comprses fractal ature of data as well as the edge (boudary) pheomeo. I the frst case, the s the local scalg expoet; the other case, the captures the boudary effect. Usually the value of s gve by a mx of these two pheomea ad s lower tha t would be for these two pheomea separately. Theorem Let, for uery pot x R, there exst a scalg expoet ad a dstace r such that ball B(x, wth ceter x ad radus r the pots are spread uformly wth respect to d (). Let d () (x, x + ) betwee two successve ear eghbors of pot x have mea E(d () ) ad let /. The, the d () of the k-th earest eghbor of pot x s the radom varable wth the Erlag dstrbuto Erl(d (), k, ),.e. F( d ( ) ) exp( γ d k ( ) ) j 0 ( λd k λ k f ( d( ) ) d( ) exp( γd ( ) ) k!. j! ) j ( ) Proof. Let us deote the -th earest eghbor of the pot x by x, ts dstace from pot x by d, ts d () (x, x ) by d (). Let us troduce a mappg Z: x R +: Z(x ) d (),.e. pots x are mapped to pots o the straght le, the uery pot x to pot 0. Let total umber of pots x ball B(x, be. The, the dstrbuto mappg fucto s d ( x, cost. It r follows from the assumpto that umber of pots at dstace from pot x grows learly wth. The, there s a proportoalty costat /r. It follows that mappg Z
64 Marcel Jra pots x, x 2,... are dstrbuted radomly ad uformly, ad mea d () of the eghbor pars s r / / λ. We, the, have uform dstrbuto of pots o d () ad the the d () betwee the eghbor pots has expoetal dstrbuto [23] wth parameter. From t d () of the k-th pot x k from pot x s gve by the sum of d () s betwee successve eghbors. The, t s a radom varable eual to the sum of radom varables wth detcal expoetal dstrbuto [23], [24] wth parameter, the d xk ) Erl( d, k, ). ( )( ( ) λ ote that ths theorem s a geeralzato of the theorem [27] derved wth the use of the spatal pot process. I [28] author speaks about geeralzed gamma dstrbuto, but, at the same tme, cosders the shape parameter as a postve teger that s the specal case ofte called the Erlag dstrbuto. Moreover, he cosders a volume of a ball the embeddg space of dmeso m (postve teger; here deoted by ) whle we use more geeral dstrbuto mappg (scalg) expoet (real, postve). B. Emprcal dstrbuto ad emprcal DME We have poted out that the dstrbuto mappg expoet s othg else tha the multfractal dmeso (also kow as scalg (sgularty or Hölde expoet or sgularty stregth) [7], [9], [20], [2], [27]. There s also close relato to a well-kow correlato tegral [7] ad correlato dmeso. We have show that correlato tegral ca be uderstood as a mea of probablty dstrbuto mappg fuctos for all pots of the data set. The multfractal dmeso, scalg (sgularty or Hölde expoet or sgularty stregth s ofte used for characterzato of oe dmesoal or two-dmesoal data,.e. for sgals ad pctures. Our results are vald for multvarate data that eed ot form a seres because data are cosdered as dvdual pots a multvarate space wth proper metrcs. fractal dmeso ad t ca be easly see that ths s ot caused by lack of data pots or by edge effects, as show the followg. After samples were geerated, each sample was take as a uery pot x ad for each sample 0 earest eghbors were foud ad ther dstaces r,, 2,... 0 recorded. After that a dstrbuto-mappg expoet was foud as a value for whch mea values of r grow learly,.e. successve dffereces of ther meas d E r ) E( r ) are ( approxmately costat; r 0. I ths expermet, results 0 are apparetly flueced by procedure of pots geerato because a opposte case the dstrbuto mappg expoet would be eual to data space dmeso. Truly, we orgaze the smulatos ths way to show pots ad 2 above. The values of foud for dmeso from to 40 are show the secod colum of Table. foud by smulato 5 4.8 0 9 20 5.6 40 28 Table. Values of the dstrbuto-mappg expoet for some uform -dmesoal cubes foud by smulato ad approxmated. I Fg., dffereces d E r ) E( r ), r 0 ( 0 for 20 ad three dfferet expoets are show. Here t s see that the value 5.6 gves approxmately a horzotal straght le. It meas that dffereces d () are all early the same. Ths fact correspods to approxmately uform dstrbuto of varable r. The, 5.6 s a good estmate for the dstrbuto-mappg expoet ths case. VI. Smulato aalyss The target of smulato s to demostrate. That the dstrbuto mappg expoet oe ca use as a relatvely global feature for characterzg the data set gve. 2. That the emprcal dstrbuto of varable r of earest eghbors s really close to the Erlag dstrbuto, as stated Theorem. For all smulatos, 32 000 samples -dmesoal cube were used. Data a cube were geerated so that each coordate was a radom varable wth uform dstrbuto o (0, ). Thus, the geerated pots multdmesoal hypercube were ot pots arsg from a multdmesoal Posso process. Oly multvarate data geerated by the multdmesoal Posso process have fractal dmeso eual to the embeddg space dmesoalty. Data geerated by method descrbed the apparetly have lower Fgure. Dffereces d E r ) E( r ), r 0 for ( 0 20, 0 earest eghbors, ad three dfferet expoets. I Fg. 2, hstograms of d () r for 5.6, 20 ad for the frst 0 eghbors (, 2,... 0) are show. The hstograms were smoothed usg averagg over fve values. The b sze s 0.0. The hstograms show, fact, probablty
ear eghbor Dstrbuto sets of Fractal ature 65 desty fuctos for the Erlag dstrbuto for dexes (expoetal) to 0. Ths s just what was expected whe the probablty dstrbuto mappg fucto wth a proper dstrbuto-mappg expoet was used. VII. Dscusso Here we dscuss a problem of emprcal value of the dstrbuto mappg expoet ad ts use. varable wth the Erlag dstrbuto of the k-th order dmshes to zero wth order k gog to fty. O the other had, for k ( fact expoetal dstrbuto),.e. for the earest eghbor, the C V, whle for the secod earest eghbor (k 2) there s C V2 2 / 2 0.707. Ths shows why [25], [26] we do ot recommed to use the frst earest eghbor of each class cotrast to the old fdg by Cover ad Hart [32] that the frst earest eghbor brgs half of formato about class of the uery pot. Because the Erlag dstrbuto coverges to Gaussa dstrbuto for order k, the result accordg to Theorem also cotas some results of e.g. [5], [6], [8], [28] about covergece of ear-eghbor dstaces. Fgure 2. Hstograms of varable z r, 5.6, 20. Les from top to bottom, 2,... 0. I statstcs, t s commo to dfferetate theoretcal ad emprcal dstrbuto. As to scalg expoet, o such a oto was troduced the lterature about fractals ad multfractals. Dscrepaces betwee theoretcal value ad emprcally stated value are explaed by lack of data, ad ether more data are demaded or some correctos are made. Such dscrepaces ad errors were wdely dscussed, see e.g. [29], [30], [3]. I [3] correcto factors for dfferet embeddg dmesos are preseted, based o the assumpto of data uformly dstrbuted a cube. There s a geeral objecto that data do ot form a cube ad that cube s, the, too strct a model for edge pheomeo. Moreover, there s a slghtly problematc way of how uform data are geerated, as dscussed above. Here we do ot use such correctos but a estmated emprcal value. The emprcal value s flueced frst by true local scalg expoet of the data geeratg process, ad secod by edge effect caused by lmted amout of data. Correctos dscussed above try to elmate ths fluece. I the case of probablty desty estmato ad classfcato ad other tasks dealg wth eghbor s dstaces oe eeds to work wth the eghbor s dstace dstrbuto as t appears a gve emprcal evromet, ad dealzed lmt case of the umber of data pots gog to fty s foud as mpractcal. Istead, a large umber of expermets wth the same fte amout of data are cosdered; wth the umber of such expermets evetually gog to fty. ow cosder a dstrbuto of k-th earest eghbor. Let us use the coeffcet of varato C V that s a ormalzed measure of dsperso of a probablty dstrbuto. The coeffcet of varato C V s defed as the rato of the stadard devato to the mea. For the Erlag dstrbuto of k-th order t (results a?) terestg formula for the coeffcet of varato C V / / k. Ths relato shows that relatve spread of VIII. Cocluso Whe usg the oto of dstace, there s a loss of formato o the true dstrbuto of pots the eghborhood of the uery pot. It s kow [7], [8] that for larger dmesos somethg lke local approxmato of real dstrbuto by uform dstrbuto practce does ot exst. We have also show why. O the other had, the assumpto of at least local uformty the eghborhood of a uery pot s usually heret the methods based o the dstaces of eghbors. Itroducg power approxmato of a dstrbuto mappg fucto here solves ths problem. It has bee show that the expoet of the power approxmato s the scalg expoet kow from the theory of multfractals for the umber of pots gog to fty. For fte set, ths expoet cludes boudary effects. By usg the scalg expoet, the real emprcal dstrbuto s trasformed to appear as locally uform. It follows that whe usg expoetally scaled dstace of the k-th eghbor the scaled dstace has the Erlag dstrbuto of order k. Ackowledgmet Ths work was supported by Mstry of Educato of the Czech Republc uder IGO project o. LG 2020. Refereces [] Duda, R.O., Hart, P.E., Stork, D.G., Patter Classfcato. Secod Edto. Joh Wley ad Sos, Ic., ew York, 2000. [2] Slverma, B. W., Desty Estmato for Statstcs ad Data Aalyss. Chapma ad Hall, Lodo, 986. [3] Qamar, A.M., Gausser, E., RELIEF Algorthm ad Smlarty Learg for k-. Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos, Vol 4 (202), pp. 445-458. Avalable at [4] http://www.mrlabs.org/jcsm/regular_papers_202/pa per49.pdf [5] Eade, Wt. T. et al., Statstcal Methods Expermetal Physcs. orth-hollad, 982.
66 Marcel Jra [6] Moore, D.S., Yackel, J.W.: Cosstecy Propertes of earest eghbor Desty Fucto Estmators. Aals of Statstcs, Vol. 5, o., pp. 43-54 (977) [7] Heburg, A., Aggarwal, C.C., Kem, D.A., What s the earest eghbor hgh dmesoal spaces? Proc. of the 26th VLDB Cof., Caro, Egypt, 2000, pp. 506-55. [8] Arya, S., Mout, D.M., araya, O., Accoutg for Boudary Effects earest eghbor Searchg. Dscrete ad Computatoal Geometry, Vol. 6 (996), pp. 55-76. [9] Beyer, K. et al., Whe s earest eghbor Meagful? Proc. of the 7th Iteratoal Coferece o Database Theory. Jerusalem, Israel, 0-2 Jauary 999, pp. 27-235. [0] Bau, G., Cadre, B., Rouvere, L.: Statstcal Aalyss of k-earest eghbor Collaboratve Recommedato. The Aals of Statstcs, Vol. 38, o. 3, pp. 568 592, (200) [] Stute, W.: Asymptotc ormalty of earest eghbor Regresso Fucto Estmates. The Aals of Statstcs, Vol. 2, o. 3., pp. 97-926, (984) [2] Rpley, B.D.: Modellg Spatal Patters. Joural of the Royal Statstcal Socety. Seres B (Methodologcal), Vol. 39, o. 2, pp. 72-22 (977) [3] Dxo, P.M.: Rpley s K fucto. I: Abdel H. El-Shaaraw ad Walter W. Pegorsch (Eds.): Ecyclopeda of Evrometrcs, Volume 3, pp. 796 803 (ISB 047 899976), Joh Wley & Sos, Ltd, Chchester, (2002). [4] Marco, E., Puech, F.: Evaluatg the Geographc Cocetrato of Idustres Usg Dstace-based Methods. Joural of Ecoomc Geography, Vol. 3, o. 4, pp. 409 428 (2003) [5] Evas, D.: A law of large umbers for earest eghbor statstcs. Proc. R. Soc. A., Vol. 464, pp. 375 392 (2008) [6] Boett M, Pagao M.: The terpot dstace dstrbuto as a descrptor of pot patters, wth a applcato to spatal dsease clusterg. Stat. Med. Vol. 24, o. 5, pp. 753-773 (2005) [7] Grassberger, P., Procacca, I.. Measurg the strageess of strage attractors, Physca, Vol. 9D, pp. 89-208, (983). [8] Slverma, B.W.: Lmt theorems for dssocated radom varables. Advaces Appled Probablty, Vol. 8, pp. 806-89 (976) [9] Madelbrot, B.B.: The Fractal Geometry of ature, W. H. Freema & Co; ISB 0-767-86-9 (982). [20] Chhabra, A., Jese, R.V.: Drect Determato of the f(α) Sgularty Spectrum. Physcal Revew Letters, Vol. 62, o. 2, pp. 327-330 (989) [2] Hu, J., Gao, J. ad Wag, X.: Multfractal aalyss of suspot tme seres: the effects of the -year cycle ad Fourer trucato. Joural of Statstcal Mechacs P2066, 20 pp. (2009) [22] Trved, K.S.: Probablty ad Statstcs wth Relablty, Queug, ad Computer Scece Applcatos. Pretce-Hall, Ic., Eglewood Clffs, J 07632, USA, (982) [23] Klerock, L., Queueg Systems, Volume I: Theory. Joh Wley & Sos, ew York, 975. [24] Demaret, J.C., Gareet, A., Sum of Expoetal Radom Varables. AEÜ, Vol. 3, 977, o., pp. 445-448. [25] Ja, M. Classfcato of Multvarate Data Usg Dstrbuto Mappg Expoet. I: Proceedgs of the Iteratoal Coferece Memoram Joh vo euma, pp. 55-68. Budapest: Budapest Muszak Foskola (Budapest Polytechc), ISB 963-754-2-3. [Iteratoal Coferece Memoram Joh vo euma, Budapest, 2.2.2003, HU] (2003) [26] Ja, M. Local Estmate of Dstrbuto Mappg Expoet for Classfcato of Multvarate Data. I: Egeerg of Itellget Systems. Mllet : ICSC, 2004. s. -6. ISB 3-906454-35-5. [Iteratoal ICSC Symposum o Egeerg of Itellget Systems /4./, Madera, 29.02.2004-02.03.2004, PT] (2004) [27] Madelbrot, B.B., Calvet, L., Fsher, A., 997. A Multfractal Model of Asset Returs. Workg Paper. Yale Uversty. Cowles Foudato Dscusso Paper #64. Avalable from: http://deas.uam.ca/deas/data/ Papers /cwlcwldpp64.html. [28] Haegg, M.: O Dstaces Uformly Radom etworks. IEEE Tras o Iformato Theory, Vol. 5, o. 0, Oct. 2005, pp. 3584-3586. [29] Smth, L.A.: Itrsc lmts o dmeso calculatos. Physcs Letters A, Vol. 33, o. 6, pp. 283-288 (988) [30] Theler, J.: Estmatg fractal dmeso. J. Opt. Soc. Am. A, Vol. 7, o. 6, pp. 055-073. (990) [3] Krakovská, A.: Correlato dmeso uderestmato. Acta Physca Slovaca, Vol. 45, o. 5, pp. 567-574 (995) [32] Cover, T.M., Hart, P.E.: earest eghbor Patter Classfcato. IEEE Tras. o Iformato Theory, Vol. IT-3, o., pp. 2-27 (967) [33] Fsher,R.A.: The use of multple measuremets taxoomc problems. Aual Eugecs, Vol. 7, Part II, pp. 79-88 (936); also : "Cotrbutos to Mathematcal Statstcs (Joh Wley, Y, 950). [34] Dggle, P.J.: Statstcal Aalyss of Spatal Pot Patters. AROLD, Lodo, 2003. [35] Baddeley, A: Spatal pot processes ad ther applcatos. Lecture otes Mathematcs 892, Sprger-Verlag, Berl, 2007, pp. 75. [36] Daley, D. J., Vere-Joes, D. A Itroducto to the Theory of Pot Processes.Vol. I: Elemetary theory ad methods. Sprger, 2005. Author Bography Marcel Ja Educato & Work experece: Faculty of Electrcal Egeerg, CTU Prague (962), Faculty of Mathematcs ad Physcs, Charles Uversty Prague (974), Ph.D. (972). Head of Departmet of System Desg at the Research Isttute of Mathematcal Maches Prague (962-986), researcher at the Isttute of Computer Applcatos Maagemet, Prague (986-989), curretly researcher at the Isttute of Computer Scece of Academy of Sceces CR, at the Departmet of olear Modelg.