Near Neighbor Distribution in Sets of Fractal Nature


 Bruno Lee
 1 years ago
 Views:
Transcription
1 Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos. ISS Volume 5 (202) 3 pp MIR Labs, ear eghbor Dstrbuto Sets of Fractal ature Marcel Ja Isttute of Computer Scece AS CR Dept. of olear Modelg Pod Vodáreskou Vží Prague, Czech Republc emal: Abstract: Dstaces of several earest eghbors of a gve pot a multdmesoal space play a mportat role some tasks of data mg. Here we aalyze these dstaces as radom varables defed to be fuctos of a gve pot ad ts kth earest eghbor. We prove that f there s a costat such that the mea kth eghbor dstace to ths costat power s proportoal to the ear eghbor dex k the ts dstace to ths costat power coverges to the Erlag dstrbuto of order k. We also show that costat s the scalg expoet kow from the theory of multfractals. Keywords: earest eghbor, fractal set, multfractal, Erlag dstrbuto I. Itroducto There are dstct problems dealg wth the earest eghbor or several earest eghbors to a gve pot data mg methods ad procedures. A rather strage behavor of the earest eghbors hgh dmesoal spaces was studed from dfferet pots of vew. Oe of them s a oparametrc techue of probablty desty estmato multdmesoal data space [], [2]. Classfcato to two or more classes ad the probablty desty estmate usg several earest eghbors s a typcal task [], [2], [5], [6]. Also clusterg s a problem where terpot dstaces are commo use. Ths class of problems looks for the hghest ualty of the probablty desty estmato, whle effcecy ad speed are secodary. The other ssue s the problem of earest eghbor searchg. Ths task s terestg ad mportat database applcatos [7], [8], [9]. A typcal task of searchg large databases s searchg for other earest eghbor ueres. For ths class of problems, maxmal performace,.e. the speed of earest eghbor searchg, s a prmary task. The use of the kearest eghbor approach for other applcatos was studed, also e.g. by [0], ad []. For the problem of searchg the earest eghbor large databases, the boudary pheomeo was studed [8]. It was show by the use of L max metrc why the actual performaces of the earest eghbor searchg algorthms ted to be much better tha ther theoretcal aalyses would suggest. The cause s just the boudary effect a hgh dmesoal space. I [9] t was foud that as dmesoalty creases, the dstace to the earest data pot approaches the dstace to the farthest data pot of the learg set. For probablty desty estmato by the kearest eghbor method, the best value of k must be carefully tued to fd optmal results. The value of k s also depedet o the total umber of samples of the learg (or trag) data set. Let a pot x (a uery pot [7] ) be gve ad let there be a ball wth ts ceter x ad cotag k pots of the learg set t. Let the volume of ths ball be V k. The, for the probablty desty estmate pot x t holds [] k / pk ( x). () V A smple expermet wth desty estmato at the ceter of a multdmesoal cube wth uformly dstrbuted pots wll show that startg from some k the value of p k (x) for larger k s ot costat, as t should be, but lesses. The boudary effect surprsgly flueces the kth earest eghbor about whch everybody would say that ths eghbor s stll very ear to pot x. I both cases, a statstcs of terpot dstaces s of prmary terest. Oe ofteused tool s the Rpley Kfucto [2] usually wrtte the form: K(  E(umber of further pots wth dstace r of a arbtrary pot), where s the testy of the process or the expected umber of evets per ut of area (assumed costat). The Kfucto has bee studed from dfferet pots of vew vast lterature. I some works, the pot process s lmted to two dmesos ad Kfucto as well [3], [4]. Marco ad Puech [4] summarze mportat features of the Kfucto cludg dfferet correctos of edge (boudary) effects. They summarze mportat features of the Kfucto cludg dfferet correctos of edge (boudary) effects. A edge effect meas that pots located close to the doma borders are problematc because a part of the crcle sde k MIR Labs, USA
2 60 Marcel Jra whch pots are supposed to be couted s outsde the doma. Result of gorg ths effect s uderestmatg K. D. Evas [5] studes a class of radom varables defed to be fuctos of a gve pot ad ts earest eghbors. If the sample pots are depedet ad detcally dstrbuted, the assocated radom varables wll also be detcally dstrbuted, but ot depedet. He shows that radom varables of ths type satsfy a strog law of large umbers, the sese that ther sample meas coverge to ther expected values almost surely as the umber of sample pots. Moreover, Evas troduced a terestg lemma that for every coutable set of pots R d, ay pot ca be the (frst) earest eghbor of at most t(dv dp ) other pots of the set, where V dp s the volume of the ut ball R d wth L p orm. Dealg wth Kfuctos, t was already Rpley [2] who poted out that the Kfucto shares some of the propertes of the terpot dstrbuto fucto, eve though t s ot a dstrbuto fucto because K( as r. Later o, Boett ad Pagao [6] troduced emprcal cumulatve dstrbuto fucto F (d) of dstaces d for total samples. Ths s, fact, a correlato tegral [7] oe of ts may forms. Boett ad Pagao the gave a proof that the scaled dstrbuto of F (d) computed at a fte set of values, d, d 2,... coverges to a multvarate ormal dstrbuto as. The same result was gve by a dfferet way already by Slverma [8]. The goal of ths study s to aalyze the dstaces of the earest eghbors from gve pot x ad the dstaces betwee two of these eghbors, the th ad ()st the space of radomly dstrbuted pots. We show that the kth earest eghbor dstace from a fxed pot (the uery pot) s dstrbuted accordg to modfed Erlag dstrbuto of order k. The modfcato depeds o that oe uses a varable that s eual to the dstace to the proper expoet power stead of the dstace aloe as a depedet varable. I ths paper we frst pot out some features of eghbor dstaces a multdmesoal space. Secod, we remd of the probablty dstrbuto mappg fucto, ad the dstrbuto desty mappg fucto. These fuctos map probablty dstrbuto of pots R to a smlar dstrbuto the space of dstaces, whch s oedmesoal,.e. R. Thrd, fluece of boudary effects o the probablty dstrbuto mappg fucto s show, ad the power approxmato of the probablty dstrbuto mappg fucto the form of (dstace) s troduced. We show that expoet s, fact, the scalg expoet kow from the theory of multfractals [9], [20], [2]. ouform as well as uform dstrbutos are cosdered. I cocluso, we ca say that the earest eghbor space does ot look so strage as show [7], [8] whe we look at t from the pot of vew of a olear scale measured by sutable power of eghbors dstaces. II. ear eghbors Dstrbuto Problem The earesteghborbased methods usually use () for a probablty desty estmate ad are based o the dstaces of eghbors from a gve pot. Usg the eghbor dstaces or terpot dstaces for the probablty desty estmato should copy the features of the probablty desty fucto based o real data. The dea of most eareghborsbased methods as well as kerel methods [2] assumes a reasoable statstcal dstrbuto the eghborhood of the pot uesto. That meas that for ay pot x (the uery pot [7] ), the statstcal dstrbuto of the data pots x surroudg t s supposed to be depedet of the locato of the eghbor pots ad ther dstaces x from pot x. Ths assumpto s ofte ot met, especally for small data sets ad hgher dmesos. To llustrate ths, let us cosder Eucldea metrcs ad uformly dstrbuted pots cube (0.5, + 0.5). Let there be a ball wth ts ceter the org ad the radus eual to Ths ball occupes 4 3π ,.e. more tha 52 % of that cube a threedmesoal space, ,.e. 8 % of ut cube 6dmesoal space, dmesoal space, ad 3.28e dmesoal space. It s the see that startg by some dmeso, say 5 or 6 ad some dex, the th earest eghbor does ot le such a ball aroud pot x but somewhere the corer of that cube but outsde ths ball. Drawg a larger ball, we ca see that some parts of t are empty, especally parts ear to ts surface. I farther places from the org, the space thus seems to be less dese tha ear the org. It follows that ths th eghbor les farther from pot x as would follow from the supposed uformty of dstrbuto. The fucto f ( ) r, where r s the mea dstace of the th eghbor from pot x, should grow learly wth dex the case of uform dstrbuto wthout the boudary effect metoed. I the other case, ths fucto grows faster tha learly. There s some ukow dstrbuto of pots the eghborhood of pot x. If ths dstrbuto s ot uform, we would lke to have fucto f() uform a sese that f() s proportoal to umber. The best choce would be f ( ) for uform dstrbuto ad o boudary effects. r I real cases of hgher dmesos, boudary effects occur every tme. ot to eglect these effects, let us choose a fucto f ( ), where s a sutable power,. A r sutable choce of wll be dscussed later. III. Fractal systems Oe of the most mportat elemets of the chaos theory are sgularty expoets (also called scalg expoets). They are used multfractal chaotc seres aalyss. We try here to use these expoets a formula for probablty dstrbuto of ear eghbors of a gve posto x. Ths task usually has othg to do wth tme seres but as show already by Madelbrot 982 [9] ay data may possess a fractal or multfractal ature. It ca be foud that the chaos theory provdes some useful elemets ad tools that could be utlzed for estmatg the probablty metoed, ad coseuetly to use them, e.g. for classfcato tasks or for tasks of earest eghbors searchg. The chaos theory s focused o chaotc processes that are descrbed by tme seres. Therefore, the order of values of varables plays sgfcat role. There s a lot of practcal tasks descrbed by data that do ot form a seres. I spte of that some elemets of the chaos theory could be used for
3 ear eghbor Dstrbuto sets of Fractal ature 6 processg of data of ths kd [27]. Such data are, for example, the wellkow rs data o three speces of rs flower gve by Fsher [33]. Each dvdual sample descrbes oe partcular flower but ether flowers or data about them form seres. There s a set of flowers as well as a set of data wthout ay orderg. The task s to state to what speces a flower belogs accordg to measured data. Also oe ca defe a dstace or dssmlarty defed o the parameter space that descrbes dvdual flowers as pots. ow oe ca defe a dstace betwee two flowers ad also for a partcular flower to fd the earest eghbor,.e. the most smlar oe, the secod earest,.e. the secod most smlar flower ad so o. Fractal systems are kow to exhbt a fractoal power fucto depedece betwee a varable a called a scale (of a resoluto uatty, or of a measure that ca be assocated wth a ouform dstrbuto wth a support defed for the fractal set) ad freuecy s or of probablty P of ts appearace, s h ( a) a or P h ( a) a. Here h s called a fractal dmeso of a fractal system. A multfractal system s a geeralzato of a fractal system whch a sgle expoet h (the fractal dmeso) s ot eough to descrbe ts behavor; stead, a cotuous spectrum of expoets (the socalled sgularty spectrum or Hausdorff spectrum) s eeded. I a multfractal system, a local power law descrbes the behavor aroud ay pot x s( x a s x a h( x) + ) ( ). (2) The expoet h(x ) s called the sgularty (or holde expoet or sgularty stregth [20], as t descrbes the local degree of sgularty or regularty aroud pot x, ad a s called a scale of a multresoluto uatty or of a measure. The esemble formed by all the pots that share the same sgularty expoet s called the sgularty mafold of expoet h, ad s a fractal set of fractal dmeso D(h). The curve D(h) vs. h s called the sgularty spectrum or Hausdorff spectrum ad fully descrbes the (statstcal) dstrbuto of varable a. The sgularty spectrum of a moofractal cossts of a sgle pot. For multfractal objects, oe usually observes a global h power law scalg of the form (2) or smply P( a) a. That s a local power law that descrbes the behavor aroud ay pot x at least some rage of scales ad for some rage of h. Whe such a behavor s observed, oe talks of scale varace, selfsmlarty or multscalg. I a slghtly dfferet perspectve, fractal appears as (umerably may) pots a multdmesoal space of dmeso. I practce, the fractal set s gve, sampled, by fte collecto of M samples (patters, measuremets etc.) that form solated pots dmesoal space. I case of depedet samples there s o orderg or seuece of samples, thus there s o trajectory. Samples ca oly be more smlar or dssmlar some sese. If the dmesoal space s a metrc space, the most commo dssmlarty measure s a dstace. For a collecto of samples of the fractal set the mutual dstaces betwee samples (pots) are dfferet realzatos of a radom varable called dstace a betwee samples. Ths radom varable has support (0, ). The dstace a betwee samples has dstrbuto fucto F(a) ad correspodg probablty desty P(a). Thus, we have troduced a fractal set, the support o t ad multfractal measure P(a). Ideed, havg a par o. of two samples lyg dstace a oe from the other, the P(a ) s a probablty of appearace a.,.e. of dstace a at place. Apparetly, the way of eumeratg ca be arbtrary. To state a emprcal dstrbuto F(a ) ad desty P(a ) let us eumerate (sort) all dstaces a,, 2,.. M, M ( ) so that a < a +. The dstrbuto fucto 2 F(a ) s the a starcase fucto of steps, ad F(a ) /M. A emprcal dervatve of F(a) at a s P(a ) ad ca be approxmated as P(a ) 0, P(a ) /(a  a  ). More sophstcated approxmatos of P(a ) are possble. After that oe ca use formulas for Hausdorff dmeso f() ad sgularty stregth () above ad get the sgularty spectrum. IV. Aalyss A. Pot process of eghbors Let us assume radom dstrbuto of pots x,, 2,, bouded rego W of dmesoal space R wth L p metrcs. We cosder appearace of eghbors aroud a gve locato (uery pot) x as a pot process P R [34], [35]. Throughout ths part, we say that pot x s sde S the followg sese: For each eghbor y cosdered, the ball wth ts ceter at x ad radus eual to xy les sde S. Ths s the case where the boudary effects do ot take place. B. Oedmesoal case I ths partcular case, we are terested homogeous Posso process oly. Postos of pots o R + from locato u 0 are characterzed by dstaces from pot 0; the kth eghbor of pot 0 appears at dstace r k. We use smple aalogy betwee tme ad dstace here. I the oedmesoal homogeous Posso process wth testy λ the terarrval tmes have a expoetal dstrbuto [34]. The the dstace betwee two eghbor pots s a radom varable wth expoetal dstrbuto fucto λ P( ) e ad probablty desty fucto λ p ( ) e [34], [35]. For ths dstrbuto fucto the mea s E{} / d ad t s the mea dstace betwee two eghbor pots. Let us mage a postve halfle wth a uery pot at pot 0 ad wth radomly ad uformly spread pots,.e. the dstace betwee two eghbor pots s wth mea d. The uesto s: What s the dstrbuto of the dstace of each pot from pot 0? The dstace of th pot (th eghbor of pot 0) s smply the sum of dvdual dstaces betwee two successve pots. These dvdual dstaces are depedet ad have the same expoetal dstrbuto wth /d. Because the dstace s the sum of all successve dstaces, t s also the sum of depedet expoetally dstrbuted radom varables. Ths problem was studed coecto wth mass servce (ueug) systems ad represets the ssue of depedet expoetal servers workg seres
4 62 Marcel Jra (cascade) [22]. The servers have the same expoetal dstrbuto of the servce tme wth costat. Geerally, the resultg total servce tme after the th server s gve by gamma dstrbuto wth teger frst parameter or the Erlag dstrbuto Erl(, ) [22] [23], [24] p C. Multdmesoal case! x e λx ( x) λ. Let us have a spatal pot process P R. Pot process P cosdered, as a set of pots ca be a fractal set. It s possble to defe several kds of dstaces betwee pots of P [34].. Oe ca troduce a dstace betwee two pots of P, R j dst(x ; x j ), x, xj P. I a bouded rego W R a cumulatve dstrbuto fucto of R j CI ( lm h( r lj ) (3) ( ) j s deoted as correlato tegral. h(.) s the Heavsde step fucto. Grassberger ad Procacca [7] have troduced correlato dmeso ν as lmt C I ( ν lm (4) r 0 r 2. Let Rdst(x, P) be the shortest dstace from the gve locato u to the earest pot of P. Ths s called the cotact dstace ad the cumulatve dstrbuto fucto of Rdst(x, P) a bouded rego W R s called cotact dstrbuto fucto or the empty space fucto [35]. We ca exted dstaces from a gve locato u to the secod, thrd,... earest eghbor pot of P ad fd the correspodg dstrbuto fucto of reachg kth earest eghbor. Ths fucto s called a survvor fucto [36]. 3. For dstaces from a gve (ad fxed) locato u to other pots of P we use oto of a dstrbuto mappg fucto (DMF) troduced our prevous works [25], [26]. Defto The dstrbuto desty mappg fucto d(x, of the eghborhood of the uery pot x s fucto d( x, D( x,, r where D(x, s a probablty dstrbuto mappg fucto of the uery pot x ad radus r. I bouded rego W whe usg a proper ormalzato the DMF s, fact, a cumulatve dstrbuto fucto of dstaces from a gve locato u to all other pots of P W. We call t also the ear eghbors dstrbuto fucto of pot u. Here we use some deftos troduced our prevous works [25], [26]. Defto The probablty dstrbuto mappg fucto D(x, of the eghborhood of the uery pot x s fucto D ( x, p( z) dz, B( x, r ) where r s the dstace from the uery pot ad B(x, s the ball wth ceter x ad radus r. ote. It s see that for fxed x the fucto D(x,, r > 0 s mootoously growg from zero to oe. Fuctos D(x, ad d(x, for fxed x are the (cumulatve) probablty dstrbuto fucto ad the probablty desty fucto of the ear eghbor dstace r from pot x, respectvely. We use D(x, ad d(x, mostly ths sese. Defto Let a, b be dstaces of two pots from a uery pot x R. The d a b. ( ) We dfferetate betwee the d () ad dstace d a  b ; we wrte also d () (a, b). D. Scalg Defto Let there be a postve such that D ( x, r ) cost for r 0 +. r We call fucto d () r a power approxmato of the probablty dstrbuto mappg fucto. Ths defto aturally follows a prevous defto. The mportat thg s that the last defto exactly gves a true pcture of the scalg property kow from the fractal ad multfractal systems theory, where s kow as multfractal dmeso, scalg (sgularty or Hölde expoet or sgularty stregth [9], [20], [27], [30]. If P s ofractal, the scalg expoet has teger value for all pots of P. Especally f P s a homogeous Posso process, the [36]. If P s a moofractal, scalg expoet has the same value for all pots of P,.e. a local scalg expoet s eual to correlato dmeso ν. I the other case, P s a multfractal characterzed by local scalg expoet (u) that depeds o partcular posto of locato (uery pot) u uesto. Moreover, there s a wellkow correlato tegral form (3). The correlato tegral ca be rewrtte the form:
5 ear eghbor Dstrbuto sets of Fractal ature 63 ad also CI ( lm h( r l ), j j I lm D( x, C (. Thus, the correlato tegral s a mea of probablty dstrbuto mappg fuctos for all pots of a set cosdered. The estmated value of the correlato dmeso computed usg wellkow loglog plot [7] depeds heavly o the umber of data pots cosdered. Ths appears especally hgh dmesoal spaces. It has bee dscussed to a large extet [29], [30], [3]. Krakovska [3] gave estmate of the error correlato dmeso estmate. Her estmate evaluates the fluece of the edge (boudary) pheomeo for data a cube. The cube was cosdered as the worst case as data usually form a more rouded cloud the space. At the same tme, she geerated uformly dstrbuted data a cube so that each coordate was a radom varable wth uform dstrbuto o (0, ). Thus, pot process that geerated pots multdmesoal hypercube ths way was ot a multdmesoal Posso process. Multvarate data geerated by the multdmesoal Posso process have fractal dmeso eual to the embeddg space dmesoalty. Data geerated by the method descrbed above the apparetly have lower fractal dmeso ad t ca be easly see that ths s ot caused by lack of data pots or by edge effects (that ca be elmated by the use of the approach descrbed Chap. IV.A). We show t greater detal Chap. VI. V. Results A. Uformlke dstrbuto ad scalg expoet Let a uery pot x be surrouded by other pots uformly dstrbuted to some dstace d 0 from pot x. It meas that there s a ball B(x, d 0 ) wth pots uformly dstrbuted sde t. I ths case, the umber of pots a ball eghborhood wth ceter at the uery pot x grows wth the th power of dstace from the uery pot (up to dstace d 0 ). The dstrbuto mappg fucto D(x, also grows wth the th power of dstace from the uery pot x,.e. the umber of pots ball wth ceter at x ad radus r grows learly wth r. ad D ) ( x, ( r / d for r <0, d 0 0 >, D(x, for r/d 0 >. At the same tme, D(x,, as a fucto of z r, D(x, z), grows learly, too. The dstrbuto desty mappg fucto d(x, z), take as D( x, r ), s a costat for r (0, d 0 ) ad ( r ) zero otherwse. Let r be a dstace of the th eghbor of pot x, ad z r. It follows that the mea d (), d ( ) E( rk rk ) (pot k 0 s pot x) of successve eghbor pots s a costat uder the codto of uform dstrbuto. Measurg dstace dmesos by (dstace),.e. by the use of d () we get, fact, the same pcture as oedmesoal case, dscussed Sec. IV.B. Because the dstrbuto of d () (0,, r <0, d 0 > s uform, the d () of successve eghbors, d () (r k, r k ) s a radom varable wth expoetal dstrbuto fucto. It also follows that the d () of the th earest eghbor from the uery pot s gve by the sum of d () s betwee the successve eghbors. The, t s a radom varable wth the Erlag dstrbuto Erl(, ), / d, where () d s mea d () () betwee the successve eghbors. Defto Let there be ball B(x, wth ceter x ad radus r. Let there be a postve costat. We say that pots ball B(x, are spread uformly wth respect to d (), f d () (r k, r k ) of two successve eghbors s a radom varable wth expoetal dstrbuto fucto wth parameter. ote that the wordg... are spread uformly wth respect to d () comprses fractal ature of data as well as the edge (boudary) pheomeo. I the frst case, the s the local scalg expoet; the other case, the captures the boudary effect. Usually the value of s gve by a mx of these two pheomea ad s lower tha t would be for these two pheomea separately. Theorem Let, for uery pot x R, there exst a scalg expoet ad a dstace r such that ball B(x, wth ceter x ad radus r the pots are spread uformly wth respect to d (). Let d () (x, x + ) betwee two successve ear eghbors of pot x have mea E(d () ) ad let /. The, the d () of the kth earest eghbor of pot x s the radom varable wth the Erlag dstrbuto Erl(d (), k, ),.e. F( d ( ) ) exp( γ d k ( ) ) j 0 ( λd k λ k f ( d( ) ) d( ) exp( γd ( ) ) k!. j! ) j ( ) Proof. Let us deote the th earest eghbor of the pot x by x, ts dstace from pot x by d, ts d () (x, x ) by d (). Let us troduce a mappg Z: x R +: Z(x ) d (),.e. pots x are mapped to pots o the straght le, the uery pot x to pot 0. Let total umber of pots x ball B(x, be. The, the dstrbuto mappg fucto s d ( x, cost. It r follows from the assumpto that umber of pots at dstace from pot x grows learly wth. The, there s a proportoalty costat /r. It follows that mappg Z
6 64 Marcel Jra pots x, x 2,... are dstrbuted radomly ad uformly, ad mea d () of the eghbor pars s r / / λ. We, the, have uform dstrbuto of pots o d () ad the the d () betwee the eghbor pots has expoetal dstrbuto [23] wth parameter. From t d () of the kth pot x k from pot x s gve by the sum of d () s betwee successve eghbors. The, t s a radom varable eual to the sum of radom varables wth detcal expoetal dstrbuto [23], [24] wth parameter, the d xk ) Erl( d, k, ). ( )( ( ) λ ote that ths theorem s a geeralzato of the theorem [27] derved wth the use of the spatal pot process. I [28] author speaks about geeralzed gamma dstrbuto, but, at the same tme, cosders the shape parameter as a postve teger that s the specal case ofte called the Erlag dstrbuto. Moreover, he cosders a volume of a ball the embeddg space of dmeso m (postve teger; here deoted by ) whle we use more geeral dstrbuto mappg (scalg) expoet (real, postve). B. Emprcal dstrbuto ad emprcal DME We have poted out that the dstrbuto mappg expoet s othg else tha the multfractal dmeso (also kow as scalg (sgularty or Hölde expoet or sgularty stregth) [7], [9], [20], [2], [27]. There s also close relato to a wellkow correlato tegral [7] ad correlato dmeso. We have show that correlato tegral ca be uderstood as a mea of probablty dstrbuto mappg fuctos for all pots of the data set. The multfractal dmeso, scalg (sgularty or Hölde expoet or sgularty stregth s ofte used for characterzato of oe dmesoal or twodmesoal data,.e. for sgals ad pctures. Our results are vald for multvarate data that eed ot form a seres because data are cosdered as dvdual pots a multvarate space wth proper metrcs. fractal dmeso ad t ca be easly see that ths s ot caused by lack of data pots or by edge effects, as show the followg. After samples were geerated, each sample was take as a uery pot x ad for each sample 0 earest eghbors were foud ad ther dstaces r,, 2,... 0 recorded. After that a dstrbutomappg expoet was foud as a value for whch mea values of r grow learly,.e. successve dffereces of ther meas d E r ) E( r ) are ( approxmately costat; r 0. I ths expermet, results 0 are apparetly flueced by procedure of pots geerato because a opposte case the dstrbuto mappg expoet would be eual to data space dmeso. Truly, we orgaze the smulatos ths way to show pots ad 2 above. The values of foud for dmeso from to 40 are show the secod colum of Table. foud by smulato Table. Values of the dstrbutomappg expoet for some uform dmesoal cubes foud by smulato ad approxmated. I Fg., dffereces d E r ) E( r ), r 0 ( 0 for 20 ad three dfferet expoets are show. Here t s see that the value 5.6 gves approxmately a horzotal straght le. It meas that dffereces d () are all early the same. Ths fact correspods to approxmately uform dstrbuto of varable r. The, 5.6 s a good estmate for the dstrbutomappg expoet ths case. VI. Smulato aalyss The target of smulato s to demostrate. That the dstrbuto mappg expoet oe ca use as a relatvely global feature for characterzg the data set gve. 2. That the emprcal dstrbuto of varable r of earest eghbors s really close to the Erlag dstrbuto, as stated Theorem. For all smulatos, samples dmesoal cube were used. Data a cube were geerated so that each coordate was a radom varable wth uform dstrbuto o (0, ). Thus, the geerated pots multdmesoal hypercube were ot pots arsg from a multdmesoal Posso process. Oly multvarate data geerated by the multdmesoal Posso process have fractal dmeso eual to the embeddg space dmesoalty. Data geerated by method descrbed the apparetly have lower Fgure. Dffereces d E r ) E( r ), r 0 for ( 0 20, 0 earest eghbors, ad three dfferet expoets. I Fg. 2, hstograms of d () r for 5.6, 20 ad for the frst 0 eghbors (, 2,... 0) are show. The hstograms were smoothed usg averagg over fve values. The b sze s 0.0. The hstograms show, fact, probablty
7 ear eghbor Dstrbuto sets of Fractal ature 65 desty fuctos for the Erlag dstrbuto for dexes (expoetal) to 0. Ths s just what was expected whe the probablty dstrbuto mappg fucto wth a proper dstrbutomappg expoet was used. VII. Dscusso Here we dscuss a problem of emprcal value of the dstrbuto mappg expoet ad ts use. varable wth the Erlag dstrbuto of the kth order dmshes to zero wth order k gog to fty. O the other had, for k ( fact expoetal dstrbuto),.e. for the earest eghbor, the C V, whle for the secod earest eghbor (k 2) there s C V2 2 / Ths shows why [25], [26] we do ot recommed to use the frst earest eghbor of each class cotrast to the old fdg by Cover ad Hart [32] that the frst earest eghbor brgs half of formato about class of the uery pot. Because the Erlag dstrbuto coverges to Gaussa dstrbuto for order k, the result accordg to Theorem also cotas some results of e.g. [5], [6], [8], [28] about covergece of eareghbor dstaces. Fgure 2. Hstograms of varable z r, 5.6, 20. Les from top to bottom, 2, I statstcs, t s commo to dfferetate theoretcal ad emprcal dstrbuto. As to scalg expoet, o such a oto was troduced the lterature about fractals ad multfractals. Dscrepaces betwee theoretcal value ad emprcally stated value are explaed by lack of data, ad ether more data are demaded or some correctos are made. Such dscrepaces ad errors were wdely dscussed, see e.g. [29], [30], [3]. I [3] correcto factors for dfferet embeddg dmesos are preseted, based o the assumpto of data uformly dstrbuted a cube. There s a geeral objecto that data do ot form a cube ad that cube s, the, too strct a model for edge pheomeo. Moreover, there s a slghtly problematc way of how uform data are geerated, as dscussed above. Here we do ot use such correctos but a estmated emprcal value. The emprcal value s flueced frst by true local scalg expoet of the data geeratg process, ad secod by edge effect caused by lmted amout of data. Correctos dscussed above try to elmate ths fluece. I the case of probablty desty estmato ad classfcato ad other tasks dealg wth eghbor s dstaces oe eeds to work wth the eghbor s dstace dstrbuto as t appears a gve emprcal evromet, ad dealzed lmt case of the umber of data pots gog to fty s foud as mpractcal. Istead, a large umber of expermets wth the same fte amout of data are cosdered; wth the umber of such expermets evetually gog to fty. ow cosder a dstrbuto of kth earest eghbor. Let us use the coeffcet of varato C V that s a ormalzed measure of dsperso of a probablty dstrbuto. The coeffcet of varato C V s defed as the rato of the stadard devato to the mea. For the Erlag dstrbuto of kth order t (results a?) terestg formula for the coeffcet of varato C V / / k. Ths relato shows that relatve spread of VIII. Cocluso Whe usg the oto of dstace, there s a loss of formato o the true dstrbuto of pots the eghborhood of the uery pot. It s kow [7], [8] that for larger dmesos somethg lke local approxmato of real dstrbuto by uform dstrbuto practce does ot exst. We have also show why. O the other had, the assumpto of at least local uformty the eghborhood of a uery pot s usually heret the methods based o the dstaces of eghbors. Itroducg power approxmato of a dstrbuto mappg fucto here solves ths problem. It has bee show that the expoet of the power approxmato s the scalg expoet kow from the theory of multfractals for the umber of pots gog to fty. For fte set, ths expoet cludes boudary effects. By usg the scalg expoet, the real emprcal dstrbuto s trasformed to appear as locally uform. It follows that whe usg expoetally scaled dstace of the kth eghbor the scaled dstace has the Erlag dstrbuto of order k. Ackowledgmet Ths work was supported by Mstry of Educato of the Czech Republc uder IGO project o. LG Refereces [] Duda, R.O., Hart, P.E., Stork, D.G., Patter Classfcato. Secod Edto. Joh Wley ad Sos, Ic., ew York, [2] Slverma, B. W., Desty Estmato for Statstcs ad Data Aalyss. Chapma ad Hall, Lodo, 986. [3] Qamar, A.M., Gausser, E., RELIEF Algorthm ad Smlarty Learg for k. Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos, Vol 4 (202), pp Avalable at [4] per49.pdf [5] Eade, Wt. T. et al., Statstcal Methods Expermetal Physcs. orthhollad, 982.
8 66 Marcel Jra [6] Moore, D.S., Yackel, J.W.: Cosstecy Propertes of earest eghbor Desty Fucto Estmators. Aals of Statstcs, Vol. 5, o., pp (977) [7] Heburg, A., Aggarwal, C.C., Kem, D.A., What s the earest eghbor hgh dmesoal spaces? Proc. of the 26th VLDB Cof., Caro, Egypt, 2000, pp [8] Arya, S., Mout, D.M., araya, O., Accoutg for Boudary Effects earest eghbor Searchg. Dscrete ad Computatoal Geometry, Vol. 6 (996), pp [9] Beyer, K. et al., Whe s earest eghbor Meagful? Proc. of the 7th Iteratoal Coferece o Database Theory. Jerusalem, Israel, 02 Jauary 999, pp [0] Bau, G., Cadre, B., Rouvere, L.: Statstcal Aalyss of kearest eghbor Collaboratve Recommedato. The Aals of Statstcs, Vol. 38, o. 3, pp , (200) [] Stute, W.: Asymptotc ormalty of earest eghbor Regresso Fucto Estmates. The Aals of Statstcs, Vol. 2, o. 3., pp , (984) [2] Rpley, B.D.: Modellg Spatal Patters. Joural of the Royal Statstcal Socety. Seres B (Methodologcal), Vol. 39, o. 2, pp (977) [3] Dxo, P.M.: Rpley s K fucto. I: Abdel H. ElShaaraw ad Walter W. Pegorsch (Eds.): Ecyclopeda of Evrometrcs, Volume 3, pp (ISB ), Joh Wley & Sos, Ltd, Chchester, (2002). [4] Marco, E., Puech, F.: Evaluatg the Geographc Cocetrato of Idustres Usg Dstacebased Methods. Joural of Ecoomc Geography, Vol. 3, o. 4, pp (2003) [5] Evas, D.: A law of large umbers for earest eghbor statstcs. Proc. R. Soc. A., Vol. 464, pp (2008) [6] Boett M, Pagao M.: The terpot dstace dstrbuto as a descrptor of pot patters, wth a applcato to spatal dsease clusterg. Stat. Med. Vol. 24, o. 5, pp (2005) [7] Grassberger, P., Procacca, I.. Measurg the strageess of strage attractors, Physca, Vol. 9D, pp , (983). [8] Slverma, B.W.: Lmt theorems for dssocated radom varables. Advaces Appled Probablty, Vol. 8, pp (976) [9] Madelbrot, B.B.: The Fractal Geometry of ature, W. H. Freema & Co; ISB (982). [20] Chhabra, A., Jese, R.V.: Drect Determato of the f(α) Sgularty Spectrum. Physcal Revew Letters, Vol. 62, o. 2, pp (989) [2] Hu, J., Gao, J. ad Wag, X.: Multfractal aalyss of suspot tme seres: the effects of the year cycle ad Fourer trucato. Joural of Statstcal Mechacs P2066, 20 pp. (2009) [22] Trved, K.S.: Probablty ad Statstcs wth Relablty, Queug, ad Computer Scece Applcatos. PretceHall, Ic., Eglewood Clffs, J 07632, USA, (982) [23] Klerock, L., Queueg Systems, Volume I: Theory. Joh Wley & Sos, ew York, 975. [24] Demaret, J.C., Gareet, A., Sum of Expoetal Radom Varables. AEÜ, Vol. 3, 977, o., pp [25] Ja, M. Classfcato of Multvarate Data Usg Dstrbuto Mappg Expoet. I: Proceedgs of the Iteratoal Coferece Memoram Joh vo euma, pp Budapest: Budapest Muszak Foskola (Budapest Polytechc), ISB [Iteratoal Coferece Memoram Joh vo euma, Budapest, , HU] (2003) [26] Ja, M. Local Estmate of Dstrbuto Mappg Expoet for Classfcato of Multvarate Data. I: Egeerg of Itellget Systems. Mllet : ICSC, s. 6. ISB [Iteratoal ICSC Symposum o Egeerg of Itellget Systems /4./, Madera, , PT] (2004) [27] Madelbrot, B.B., Calvet, L., Fsher, A., 997. A Multfractal Model of Asset Returs. Workg Paper. Yale Uversty. Cowles Foudato Dscusso Paper #64. Avalable from: Papers /cwlcwldpp64.html. [28] Haegg, M.: O Dstaces Uformly Radom etworks. IEEE Tras o Iformato Theory, Vol. 5, o. 0, Oct. 2005, pp [29] Smth, L.A.: Itrsc lmts o dmeso calculatos. Physcs Letters A, Vol. 33, o. 6, pp (988) [30] Theler, J.: Estmatg fractal dmeso. J. Opt. Soc. Am. A, Vol. 7, o. 6, pp (990) [3] Krakovská, A.: Correlato dmeso uderestmato. Acta Physca Slovaca, Vol. 45, o. 5, pp (995) [32] Cover, T.M., Hart, P.E.: earest eghbor Patter Classfcato. IEEE Tras. o Iformato Theory, Vol. IT3, o., pp (967) [33] Fsher,R.A.: The use of multple measuremets taxoomc problems. Aual Eugecs, Vol. 7, Part II, pp (936); also : "Cotrbutos to Mathematcal Statstcs (Joh Wley, Y, 950). [34] Dggle, P.J.: Statstcal Aalyss of Spatal Pot Patters. AROLD, Lodo, [35] Baddeley, A: Spatal pot processes ad ther applcatos. Lecture otes Mathematcs 892, SprgerVerlag, Berl, 2007, pp. 75. [36] Daley, D. J., VereJoes, D. A Itroducto to the Theory of Pot Processes.Vol. I: Elemetary theory ad methods. Sprger, Author Bography Marcel Ja Educato & Work experece: Faculty of Electrcal Egeerg, CTU Prague (962), Faculty of Mathematcs ad Physcs, Charles Uversty Prague (974), Ph.D. (972). Head of Departmet of System Desg at the Research Isttute of Mathematcal Maches Prague ( ), researcher at the Isttute of Computer Applcatos Maagemet, Prague ( ), curretly researcher at the Isttute of Computer Scece of Academy of Sceces CR, at the Departmet of olear Modelg.
Group Nearest Neighbor Queries
Group Nearest Neghbor Queres Dmtrs Papadas Qogmao She Yufe Tao Kyrakos Mouratds Departmet of Computer Scece Hog Kog Uversty of Scece ad Techology Clear Water Bay, Hog Kog {dmtrs, qmshe, kyrakos}@cs.ust.hk
More informationAn Introduction To Error Propagation: Derivation, Meaning and Examples C Y
SWISS FEDERAL INSTITUTE OF TECHNOLOGY LAUSANNE EIDGENÖSSISCHE TECHNISCHE HOCHSCHULE LAUSANNE POLITECNICO FEDERALE DI LOSANNA DÉPARTEMENT DE MICROTECHNIQUE INSTITUT DE SYSTÈMES ROBOTIQUE Autoomous Systems
More informationMust the growth rate decline? Baumol s unbalanced growth revisited
Must the growth rate decle? Baumol s ubalaced growth rested Ncholas Oulto Bak of Eglad, Threadeedle Street, Lodo, ECR 8AH. The ews expressed are those of the author, ot ecessarly those of the Bak of Eglad.
More informationα 2 α 1 β 1 ANTISYMMETRIC WAVEFUNCTIONS: SLATER DETERMINANTS (08/24/14)
ANTISYMMETRI WAVEFUNTIONS: SLATER DETERMINANTS (08/4/4) Wavefuctos that descrbe more tha oe electro must have two characterstc propertes. Frst, scll electros are detcal partcles, the electros coordates
More informationExperiencing SAX: a Novel Symbolic Representation of Time Series
Expereg SAX: a Novel Symbol Represetato of Tme Seres JESSIA LIN jessa@se.gmu.edu Iformato ad Software Egeerg Departmet, George Maso Uversty, Farfax, VA 030 EAMONN KEOGH eamo@s.ur.edu LI WEI wl@s.ur.edu
More informationAlgebraic Point Set Surfaces
Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,
More informationDropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationSOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationMANY of the problems that arise in early vision can be
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,
More informationSupport vector domain description
Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty
More informationEffect of a spectrum of relaxation times on the capillary thinning of a filament of elastic liquid
J. NonNewtonan Flud Mech., 72 (1997) 31 53 Effect of a spectrum of relaxaton tmes on the capllary thnnng of a flament of elastc lqud V.M. Entov a, E.J. Hnch b, * a Laboratory of Appled Contnuum Mechancs,
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationStable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation
Stable Dstrbutons, Pseudorandom Generators, Embeddngs, and Data Stream Computaton PIOTR INDYK MIT, Cambrdge, Massachusetts Abstract. In ths artcle, we show several results obtaned by combnng the use of
More informationThe Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster
Issues n Poltcal Economy, Vol. 4, August 005 The Relatonshp between Exchange Rates and Stock Prces: Studed n a Multvarate Model Desslava Dmtrova, The College of Wooster In the perod November 00 to February
More informationAssessing health efficiency across countries with a twostep and bootstrap analysis *
Assessng health effcency across countres wth a twostep and bootstrap analyss * Antóno Afonso # $ and Mguel St. Aubyn # February 2007 Abstract We estmate a semparametrc model of health producton process
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More informationNo Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large Dimensional Sample Covariance Matrices
No igevalues Outside the Support of the Limitig Spectral Distributio of Large Dimesioal Sample Covariace Matrices By Z.D. Bai ad Jack W. Silverstei 2 Natioal Uiversity of Sigapore ad North Carolia State
More informationCiphers with Arbitrary Finite Domains
Cphers wth Arbtrary Fnte Domans John Black 1 and Phllp Rogaway 2 1 Dept. of Computer Scence, Unversty of Nevada, Reno NV 89557, USA, jrb@cs.unr.edu, WWW home page: http://www.cs.unr.edu/~jrb 2 Dept. of
More informationArea distortion of quasiconformal mappings
Area dstorton of quasconformal mappngs K. Astala 1 Introducton A homeomorphsm f : Ω Ω between planar domans Ω and Ω s called Kquasconformal f t s contaned n the Sobolev class W2,loc 1 (Ω) and ts drectonal
More informationBRNO UNIVERSITY OF TECHNOLOGY
BRNO UNIVERSITY OF TECHNOLOGY FACULTY OF INFORMATION TECHNOLOGY DEPARTMENT OF INTELLIGENT SYSTEMS ALGORITHMIC AND MATHEMATICAL PRINCIPLES OF AUTOMATIC NUMBER PLATE RECOGNITION SYSTEMS B.SC. THESIS AUTHOR
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More informationEnsembling Neural Networks: Many Could Be Better Than All
Artfcal Intellgence, 22, vol.37, no.2, pp.239263. @Elsever Ensemblng eural etworks: Many Could Be Better Than All ZhHua Zhou*, Janxn Wu, We Tang atonal Laboratory for ovel Software Technology, anng
More informationConsistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationSUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationMAXIMUM LIKELIHOODESTIMATION OF DISCRETELY SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH. By Yacine AïtSahalia 1
Ecoometrica, Vol. 7, No. 1 (Jauary, 22), 223 262 MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH By acie AïtSahalia 1 Whe a cotiuoustime diffusio is
More informationUniform topologies on types
Theoretcal Economcs 5 (00), 445 478 555756/000445 Unform topologes on types YChun Chen Department of Economcs, Natonal Unversty of Sngapore Alfredo D Tllo IGIER and Department of Economcs, Unverstà Lug
More informationThe Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Joural of Machie Learig Research 0 2009 22952328 Submitted 3/09; Revised 5/09; ublished 0/09 The Noparaormal: Semiparametric Estimatio of High Dimesioal Udirected Graphs Ha Liu Joh Lafferty Larry Wasserma
More informationThe Arithmetic of Investment Expenses
Fiacial Aalysts Joural Volume 69 Number 2 2013 CFA Istitute The Arithmetic of Ivestmet Expeses William F. Sharpe Recet regulatory chages have brought a reewed focus o the impact of ivestmet expeses o ivestors
More informationDo Firms Maximize? Evidence from Professional Football
Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the
More informationWho are you with and Where are you going?
Who are you wth and Where are you gong? Kota Yamaguch Alexander C. Berg Lus E. Ortz Tamara L. Berg Stony Brook Unversty Stony Brook Unversty, NY 11794, USA {kyamagu, aberg, leortz, tlberg}@cs.stonybrook.edu
More information