Utility-Driven Graph Summarization

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Utility-Driven Graph Summarization"

Transcription

1 Utlty-Drven Grp Summrzton K. Aswn Kumr Symnte Reser Ls swn Petros Efsttopoulos Symnte Reser Ls petros ABSTRACT A lot of te lrge tsets nlyze toy represent grps. In mny rel-worl ppltons, summrzng lrge grps s enefl (or neessry) so s to reue grp s sze n, tus, eve numer of enefts, nlung ut not lmte to ) sgnfnt spee-up for grp lgortms, ) grp storge spe reuton, ) fster network trnsmsson, ) mprove t prvy, 5) more effetve grp vsulzton, et. Durng te summrzton proess, potentlly useful nformton s removrom te grp (noes n eges re remove or trnsforme). Consequently, one mportnt prolem wt grp summrzton s tt, ltoug t reues te sze of te nput grp, t lso versely ffets n reues ts utlty. Te key queston tt we pose n ts pper s, n we summrze n ompress grp wle ensurng tt ts utlty or usefulness oes not rop elow ertn user-spefe utlty tresol? We explore ts queston n propose novel tertve utltyrven grp summrzton ppro. Durng tertve summrzton, we nrementlly keep trk of te utlty of te grp summry. Ts enles user to query grp summry tt s ontone on user-spefe utlty vlue. We present ot exustve n slle pproes for mplementng our propose soluton. Our expermentl results on rel-worl grp tsets sow te effetveness of our propose ppro. Fnlly, troug multple rel-worl ppltons we emonstrte te prtlty of our noton of utlty of te ompute grp summry. PVLDB Referene Formt: K. Aswn Kumr, Petros Efsttopoulos. Utlty-Drven Grp Summrzton. PVLDB, (): 5-7, 8. DOI: ttps://o.org/.778/ INTRODUCTION A lot of te vst mounts of nformton we re proung n nlyzng toy n e represente s grps. Ts ft eomes ler f one onser ll te rel-lfe t networks tt n e strtly pereve s noes onnete y eges: sol networks, fnnl trnston networks, ommunton networks, tton networks, prel spment t, proten-proten nterton networks, gene regultory networks, sese trnsmsson networks, eologl foo networks, sensor networks, ust to nme few. Te sze of su grps s growng t n unpreeente rte, spnnng mllons Ts work s lense uner te Cretve Commons Attruton- NonCommerl-NoDervtves. Interntonl Lense. To vew opy of ts lense, vst ttp://retveommons.org/lenses/y-n-n/./. For ny use eyon tose overe y ts lense, otn permsson y emlng Copyrgt s el y te owner/utor(s). Pulton rgts lense to te VLDB Enowment. Proeengs of te VLDB Enowment, Vol., No. ISSN DOI: ttps://o.org/.778/ n llons of noes n eges. For nstne, Google stores more tn trllon nexe pges tt ontn llons of nomng n outgong lnks. Smlrly, Feook s 8 mllon tve users n relte network t. At te urrent rte of t volume nrese, t s eomng gly mprtl to store, proess, nlyze, n vsulze tese g grps. Terefore, n orer to mke grp t mngement, proessng n vsulzton trtle, summrzton tenques re eomng nresngly mportnt. Tere s pletor of enefts to employng grp summrzton metos. Frst, gven plnetry sles of rel-worl grps [8], grp summrzton elps n reung te sze of te grp terey reung te on-sk storgootprnt. Te reue grp n lso e loe retly nto memory to mprove te performne of nlyts lgortms [5]. Seon, mny grp lgortms tt re oterwse too omplex or ostly to run on lrger grps n e effently exeute on summry grps, wt equtely urte results [6]. Tr, most of te rel-worl grps suffer from smll worl effet w mkes tem look too tngle to e effetvely vsulze n nterprete, resultng n te rll grp penomenon. Grp summrzton essentlly mkes tem smpler to vsulze on smll sreen n-turn elpng wt etter nlyss of tese grps [, 5,,, ]. Fnlly, wen te orgnl t s prvy senstve, grp summrzton my elp onel prvte nformton [], tus enlng prvy-preservng nlyts espelly mong multple mutully-strustful prtes. A key llenge wt grp summrzton s tt t n ve severe mpt on te mount of useful nformton represente y te grp for te tsk t n.e., te utlty of te grp. Furtermore, t s ffult to pret te reuton n utlty grp wll suffer wen summrze. Ielly, we soul e le to estmte te utlty t e summrzton step so tt te otne grp summry meets user-spefe utlty tresol. To te est of our knowlege, stte-of-te-rt grp summrzton pproes [, 5] fous prmrly on mnmzng grp reonstruton error, n lrgely gnore te utlty spet were te reltve mportne of noes n eges soul e onsere urng te summrzton proess. To ress ts gp, we pose tollowng key queston: Cn we summrze grp n ompress t s mu s possle, wle ensurng tt ts utlty oes not rop elow user-efne utlty tresol? In oter wors, we esre grp summrzton system tt permts user to query grp summry wt gven utlty. To eve ts, our summrzton lgortm must e le to keep trk of te utlty of te grp t e step of te summrzton proess. Moreover, we nee utlty estmton to e nexpensve n yet ftfully represent ertn mportnt propertes of te unerlyng grp w we wnt to retn n te ompute grp summry. In our effort to eve tese gols, we evlute vrous grp summrzton tenques tt ve een propose. In te sprsfton ppro eges rltere se on ertn rter to smplfy 5

2 te unerlyng grp. On te oter n, te smplng ppro performs smplng of suset of noes or eges so s to form smpler representton of te orgnl grp. Te most populr pproes, owever, re fferent vrnts of te groupng ppro, tt employ menngful groupng of noes nto supernoes n eges nto supereges to ompute grp summry. Groupng pproes owe ter populrty to tt tt tey re expressve enoug to llow user to loglly expln te ompute grp summry wt respet to te orgnl unerlyng grp. Moreover, tertve groupng pproes llow us to reor te lst of orretons me ross te tertons, w n elp us to reonstrut te ext orgnl grp, or n pproxmte verson of t, from te summry f neee. Susequently, elps wt provenne n explnlty, were one n expln te steps tken to re prtulr summry for gven grp (useful for forenss, nomly eteton et). Also, te tertve nture of te lgortm (groupng of noes nto supernoes n eges nto supereges n ve vers) enles menngful vsulzton n omplex nlyss urng te summrzton proess. Terefore, for ll te enefts t proves, we speflly fous on tertve groupng-se grp summrzton pproes. However, sne groupng-se grp summrzton wt mnmum reonstruton error s sown to e NP-Hr [5], t s ommon to use eursts n pproxmtons to mplement su lgortms. In ts pper, we propose novel utlty-rven grp summrzton () tenque, were grp utlty s nrementlly ompute wle tertvely performng te summrzton. Ts llows us to otn summry wt user-spefe utlty tresol, tus offerng te enefts of summrzton wle provng utlty gurntees. Our ontrutons n ts work re s follows:. We ntroue new frmework to mesure te utlty of grp wle t s eng perture y te eleton of exstng eges or te ton of spurous eges. Furtermore, we uously exten t to ompute utlty for grp summres.. We present teoretl result sowng omputtonl ntrtlty of prolem for otnng ner optml soluton.. We ntroue novel lgortm tt tertvely summrzes gven grp y employng n oetvunton tt mxmzes te utlty t e step of te trnsformton. Also, urng tertve summrzton of te grp, nrementlly omputes n keeps trk of te runnng utlty vlue.. We mprove sllty y orers of mgntue y proposng memozton-se ppro for. 5. We onut ompreensve expermentl stuy usng severl rel tsets n ppltons, n te results emonstrte tt s ple of genertng g-utlty grp summres. Te rest of te pper s orgnze s follows: In Seton, we present te relevnt kgroun n te fferent onepts susse n ts pper. In Seton, we present torml efnton of utlty, esre te set of propertes n ontons tt esrle utlty metr soul stsfy n ntroue gener frmework to estmte utlty of perture grp gven ts se grp. We present our ppro n Seton.. We esre ow we use memozton to mprove te sllty of our tenquor n Seton.. In Seton 5 we present expermentl results evlutng te effeny n effetveness of. Fnlly, we present relte work n Seton 6 n onlue n Seton 7.. PRELIMINARIES In ts seton, we present te kgroun for grp summrzton n te fferent onepts susse n ts pper. Grp Summry. Gven grp G = (V, E), ts grp summry G S = (V S,E S ) were V S = {S,S,...,S k } s set of supernoes su tt k < V. If u V,v V, ten S u represents te supernoe ontnng noe u n S uv represents te supernoe ontnng ot te noes u n v. Essentlly, V S onssts of sont sets (supernoes) of noes n V su tt V = k = S n S S = ( ). In G S, te eges E NS E onnetng te set of noes N S elongng to prtulr supernoe S re not mntne. Weres, only eges onnetng nvul supernoes re mntne. Also, f supernoes S n S re onnete wt superege, ten A, represents te tul ross eges onnetng te noes n S n S. On te oter n,, enotes te prtte grp onnetng te noes n supernoes S n S were (S,S ) V S. Alterntve nottons for A, n, tt we use n ts pper re A Su,S v n Su,S v were u V,v V. Also, n ts work, we ssume un-rete, un-wegte n ege un-lele grps. Reuton n Noes (RN). We unerstn te effetveness of our propose tenques on vryng RN. Formlly, RN = V V S V, were vlue of. mens % of orgnl noes re ollpse nto supernoes n summry retns 8% of te grp unmofe. Zero Loss Enong Trnsformtons. We efne ertn enong trnsformtons (s sown n Fgure ) use to represent group of noes n eges n grp G wt supernoes n supereges n summrze grp G S wtout loss of nformton. Rule : group of noes tt re not onnete to e oter n te grp G s smply represente y supernoe wtout self-loop. Rule : group of noes tt form lque n grp G s represente y supernoe (wt self-loop). Rule : f tere s n ll-to-ll onneton etween two sets of noes, ten tey re represente y two supernoes onnete wt sngle superege. Zero loss n ts ontext mens tt f we pply tese trnsformtons n reverse orer on grp summry, ten we soul e le to otn te orgnl grp wtout neeng ny tonl nformton or orretons. Note tt n ts ontext, zero loss lso mples % utlty euse te trnsformtons re le to preserve ll te slent regons of G. We mke use of tese trnsformtons urng summrzton n lulton of utlty (Setons n.). G Gs Fgure : Exmples of tree enong rules for zero-loss summrzton Utlty (EU). Te utlty EU of ny grp G S tt s otne y trnsformng n grp G ntes te usefulness of G S wt respet to G. Te ger te extent to w mportnt regons n G re preserve n te trnsforme grp, te greter te utlty. Exmple of Utlty-Drven Grp Summrzton (). Let us onser n exmple. Fgure presents tertons of esrle system. We envson summrzton system tt reports t e terton te urrent EU n RN vlues of grp summry G S. Fgure offers te vlues for EU n RN, wose lulton s susse n-etl n te omng setons. Te nput grp s sown n Fgure (). Te user proves utlty tresol Γ U s prete to te system, ntng tt te summry G S soul ve utlty no less tn Γ U. In ts exmple, let s sy Γ U equls.9. Fgures () () sow trst egt tertons of grp summrzton wt vryng EU n RN vlues long te wy. At every terton, pr of noes s selete n ollpse to form supernoes, n negorng eges re uste orngly. Te summrzton system nlyzes te mportnt prts n regons of te nput grp (.e., te output of te prevous terton) n prortzes te orer n w noes re ollpse orngly. In every terton, te oetve s to preserve mportnt regons of te G s mu s possle n G S. In trst terton, two noes re ollpse nto supernoe, n eges re uste orngly. 6

3 () Input grp () EU:., RN:.6 () EU:., RN:. () EU:.99, RN:.9 (e) EU:.98, RN:.5 (f) EU:.98, RN:. (g) EU:.95, RN: () EU:.9, RN:. Fgure : Exmple output of utlty-rven grp summrzton Note tt, n ts terton, te EU vlue remns. euse we n stll reonstrut G from G S y smply pplyng te eong rules of Fgure. Also, RN =.6 s te numer of noes n G S s reue y. By te en of te seon terton, EU remns. In te tr terton, owever, EU s reue to.99, sne reonstrutng G from G S proue n ts step wll ntroue spurous eges. Te reuton n EU s., se on te extent to w mportnt regons rfete n G. Smlrly, ll tertons from to 7 use rop n EU. Note tt te quntum of reuton n EU from () to (e) s less tn (f) to (g). Ts s euse te merge step t (e) preserves mportnt regons etter tn te merge step t (g). Ts wll e explne n etl n omng Setons. Overll, te lgortm termntes t te sevent terton () s ny ttempt to furter summrze te grp woul use te EU to rop elow te user-spefe tresol Γ U =.9. Fnlly, te ompute grp summry G S (n Fgure ()) s presente to te user s te output.. UTILITY OF A GRAPH SUMMARY Tulrum of ts work s our propose meto for lultng te utlty of grp summry wt respet to n unerlyng grp. We ppro ts prolem y ttemptng to reonstrut te orgnl grp G from summry G S wt no extr nformton. For reonstruton, we pply te reverse of te trnsformtons susse n Seton. Ts n result n te loss of orgnl eges s well s ntrouton of spurous eges. Supernoes wt self-loops re expne nto lque of ter ontne noes, oterwse tey re expne nto sonnete noes. A pr of sets of se noes form prtte grp f te orresponng supernoes re onnete y superege, oterwse tey re ompletely sonnete. Morormlly, gven G S of grp G, we reonstrut te grp G = (V,E ) from G S su tt V = V. Te numer of noes n te noe set n te G re equvlent to tt of G, ltoug te numer of eges mgt vry prmrly ue to te error ntroue y grp summrzton. Fgure presents n exmple of grp G, ts summrzton G S, n grp G w s reonstrutrom G S y pplyng te rules sown n Fgure n te reverse orer. e o f p q n l m e, f o, p, q,, l, m, n o p e q G G S G Fgure : Exmple of grp G, ts summry G S, n reonstruton G One G S s trnsforme nto reonstrute grp G, te prolem of lultng te utlty of grp summry s reue to te prolem of lultng te utlty of G wt respet to G, usng utlty funton enote y s EU(G ) G. In essene, wen tere s greter struturl smlrty etween G n te reonstrute grp G (.e., te extent to w mportnt eges n regons n f n m l G re preserve n G ) ten te utlty of te G S s ger. Te reonstrute grp G otnrom G S s equvlent to grp G otne y perturng G (y ng ertn spurous eges, or removng orgnl eges, or ot). Terefore, from now on we wll ll te reonstrute grp s te perture grp. Next, we present gener frmework to lulte te utlty of G wt respet to G. Gener Frmework for Grp Utlty Funton. Our key ntuton s to penlze te utlty of grp G n orne wt te ntroue perturtons. Te mount of ost or penlty soul e se on te mportne of eges tt re mssng, or te numer of spurous eges ntroue, or ot. An ntutve wy to ssess te reltve mportne of eges n te orgnl grp G s y omputng normlze ege entrlty sores egeis. If {E E } s te set of eges mssng from G ompre to G s orgnl eges, ten te utlty of G s penlze y te sum of reltve mportne sores of mssng eges. Next, we soul penlze G s utlty orng to ny spurous eges t ontns, tt not exst n G. We o ts y lultng te proporton of spurous eges ntroue n G to te totl numer of spurous eges possle n te se grp G. More formlly, te mxmum numer of spurous eges tt n e ntroue n G s ( V ) E. If {E E} s te set of spurous eges ntroue n G, n ssumng omogenety, ten for e spurous ege te utlty EU(G ) G s penlze y te mount (. V ) E Algortm Gener Grp Utlty Funton (GGUF) : proeure GGUF(G = (V,E),G = (V,E )) : utlty =. : egeis = normlze(ege entrlty sores(g)) : f G / n G / ten 5: for e {E E } o 6: utlty = utlty egeis[e] 7: en for 8: for e {E E} o 9: penlty = ( V ) E : f penlty < utlty ten : utlty = utlty penlty : else : utlty = : en f 5: en for 6: en f 7: return utlty 8: en proeure Te vlue of utlty s n te rnge [,]. Gven non-empty n non-lque grp G, tere rour notle ontons uner w te utlty of G s zero: ) f G s n empty grp, ) f G s lque, ) f G s mssng ll te orgnl eges, n ) f G ontns ll te possle spurous eges. Pseuooor te gener grp utlty funton GGUF s sown n Algortm. Wtout loss of generlty, t n e esly extene to wegte grps were penltes wll e wegt uste. Moreover, te gener nture of GGUF llows us to plug-n vrety of entrlty metrs to form fferent types of utlty funtons e extng fferent propertes. Next, we 7

4 entfy ertn ntutve propertes utlty funton soul ext n suss ow to ssess ts esrlty. Assessng te Desrlty of Utlty Funton To mke utlty funton wre of te mportnt regons of G tt re preserve n G, we use set of frly ntutve propertes esre n Tle tt esrle grp utlty metr soul ext. Te key motvton n efnng tese propertes n mposng neessry ontons for esrle utlty metr s tt te mxmzton of su utlty metr urng summrzton soul elp mntn te results of mportnt grp lgortms, su s rnkng n ommunty eteton. To furter expln tese propertes n test te esrlty of Tle : Propertes of esrle Grp Utlty Funton Crter Propertes Desrpton C Ege Importne Cnges tt rete sonnete omponents or weken te onnetvty soul e penlze more tn te nges tt mntn te onnetvty propertes of te grps. C Spurous Ege More spurous eges must le to Awreness lower utlty. C Wegt Awreness In wegte grps, ger te wegt of te remove ege or e spurous ege s, te greter te mpt on te smlrty mesure soul e. C Ege Sumoulrty A spef nge s more mportnt n grp wt fewer eges tn n mu enser grp. utlty funton, we use exmple moel grps sown n Fgure wt vrous spes n vryng numer of mssng eges, su s: lque, pt, yle, rell, weel rell, et. Note tt tese exmples re not exustve n re only ment to expln te key onepts. Also, t s not neessry tt esrle utlty funton exts ll te lste propertes n onunton; t s only requre to ext e property nepenently. We present n exmple test rteron. tt uses te sown moel grps to test f utlty funton exts te esre property n ts se rteron C. Exmple Test Crter. Conser rell grps B n,mb n n mmb n to expln C: ege mportne rteron. Grp B n s two lques of sze n n n, su tt n = n + n. Grp mb n s n ege removrom one of te lques n B n, were grp mmb n s mssng rge egrom B n. In ts se, orng to ege mportne rteron C, followng soul stsfy: K5 (EU(mB n ) Bn EU(mmB n ) Bn ) > () e e g g ml g mk5 mc5 mb mwb e mmk5 e e g g C5 B L e g g mc5 mml mmb 5 g g g w5b wb mmwb k k k g BC g mbc g mmbc g WB g mwb g mmwb f Fgure : Moel syntet grps use to vlte utlty funton K n : lque of sze n, P n : pt of sze n, C n : yle of sze n, L n : lollpop of sze n, B n : rell of sze n, WB n : weel rell of sze n, m X : mssng X eges, n mm X : mssng X rge eges. Smlrly, tonl exmple test rter re presente n Seton 5. to test f utlty funton exts te remnng esre G9 e g propertes. Also, n Seton 5. we present expermentl results were we try vrous entrlty metrs n GGUF n prove guelnes for te rgt set of entrlty metrs to e plugge-n, so s to rete utlty funton tt exts te propertes of Tle. Dsusson. Clulton of utlty EU(G ) G n struturl smlrty troug smple grp et stne (GED) etween G n G ltoug seem smlr, tey ffer n sgnfnt wys. GED essentlly ounts te numer of fferent eges etween te orgnl grp n te restruture grp se on te grp summry. It n e note tt GED oes not fferentte etween non-mportnt regons from mportnt regons n te grp s GGUF oes. Moreover, n smple GED, ost of et opertons s fxe, weres n our se ost of ets s ynm n epens on te struture of te orgnl grp. Also, smple GED voltes ertn key propertes tt our utlty funton stsfes. For exmple, onser grp G = (V,E) wt E = ( V ) eges n lets sy G = (V,E ) e ts perture grp tt s lque wt E = ( V ) eges. Ten utlty of G wt respet to G s zero (lowest) orng to GGUF (Algortm ), ut smple GED woul lulte te utlty vlue >, were utlty s lulte s E, were s te numer of ets or stne. Intutvely, utlty vlue of zero s esrle n ts se, euse f G s non-lque n non-empty, ten no mtter ow ense G s, f G s lque, ten essentlly G oes not revel ny nformton wt respet to G, tus renerng ts utlty equl to zero. We note tt smple GED voltes ll te esre propertes of n el utlty funton (Tle ) exept C weres GGUF wen plugge wt pproprte entrlty metr stsfes ll te propertes. We ve nlue n experment n Tle 5 to emonstrte ts. We lso note tt, te smplty of GGUF permts us to esly exten t so s to nrementlly lulte te utlty of G wle t s eng perture. In ts se, we strt wt utlty of. tt represents G = G. As we pertur G y eletng (or ng spurous) eges, or ot, we penlze te utlty orngly y sutrtng te pproprte ost. Smlrly, we nrementlly lulte te utlty of G S t e terton, y nlyzng te possle perturtons wtout tully genertng te reonstrute grp t e summry step.. UTILITY-DRIVEN SUMMARIZATION We egn our susson y presentng te mtemtl formulton of our prolem. Gven grp G = (V,E) n utlty tresol Γ U, we wnt to summrze te grp G s mu s possle y groupng noes nto mnmum numer of supernoes V S n form supereges E S etween supernoes su tt te fferene etween totl utlty of retne tul eges n totl penlty of ntroue spurous eges s very lose to te gven Γ U. Intlly, e noe n te orgnl grp s ts own supernoe n te summry grp. Suet to S S e A, (S,S ) E S mnmze ( V S ) () egeis[e], A, ( V ) Γ U () E Sne prolem of grp summrzton s sown to e NP-Hr [5], one my e ntereste n otnng prtton tt s p-pproxmton for some p >. However, omputtonl ntrtlty result for otnng ner optml prtton n e estlse s follows. Teorem. [No Effent Approxmton Teorem] For ny ε >, tere s no O(n ε )-pproxmton for te prolem of otnng fesle grp summrzton wt mnmum numer of supernoes for gven utlty tresol, unless NP = ZPP. Ts ntrtlty result s se on te wely eleve ssumpton tt omplexty lsses NP n ZPP re fferent [9]. 8

5 PROOF. Dvson et l., [] ve prove tt for te prolem of otnng fesle lusterng wt mnmum numer of lusters uner nnot-lnk (CL) onstrnts f, for some ε >, tere exsts O(n ε )-pproxmton for teslty prolem ten tt woul mply NP = ZPP. Here, CL onstrnts nvolve t ponts (tt re requre to e) n fferent lusters. Followng ts result, we retly reue te prolem of otnng fesle lusterng wt mnmum numer of lusters to te our prolem to prove te result. Gven set of t ponts D = {,,..., V }. Let E, e te mesure of stne etween t ponts were < E, represents te ponts tt re reltvely loser to e oter n E, = oterwse. Let V S e te set of lusters of t ponts. Intlly e t pont s ts own luster S. It s strgtforwr to see tt t ponts D wt pror stne vlues represent grp G = (V,E) were vlues < E, represent ege wegte grps n tey represent ege unwegte grps f tese E, tkes vlue of or. Vlues of A,,, n egeis n e lulte se on E, vlues. Sne tese vlues re efne over te t ponts tt re n fferent lusters, onstrnt (Equton ) usng tese vlues s essentlly forme y set of CL onstrnts. Oetve of mnmzng te numer of lusters V S for gven set of CL onstrnts on pr of t ponts n e retly mppe to te oetve of groupng te noes from te orgnl grp nto mnmum numer of supernoes suet to te set of onstrnts nvolvng prs of noes n fferent supernoes s sown n Equtons n. Proof ompletes. Beuse t s not possle to evse fesle or effent pproxmton lgortm for te prolem t n. Inste, we rely on greey eursts tt oes est effort t e step tken.. Itertve Greey We present novel tertve greey lgortm wt n nrementl utlty upte. Our prmry gol s to summrze te gven grp G so s to ompress t to n extent su tt te utlty of te summry grp G S oes not rop elow user-spefe tresol Γ U. To ompose our lgortm we nee to etermne tollowng steps, se on prnples presente n te prevous Seton: ) ntroue strtegy for groupng noes, ) fn n tertve, utlty-rven summrzton repe, ) ome up wt pproprte superege onnetvty rter, ) present tenques to nrementlly keep trk of utlty, 5) optmze te lgortm s performne n sllty. Prortzng Cntes to Merge. One wy to prortze te mergng of noes s y onserng ege mportne. Te gol s to pk n ege e wt te lowest mportne n merge te noes u n v t e s en-ponts so s to form supernoe w. However, ts ppro ompletely forgoes te eneft of mergng noes tt re nretly onnete to e oter. Mny tmes, ollpsng noes tt re not retly onnete n formng pproprte supereges mgt result n ger utlty. For exmple, t s often enefl to ollpse noes tt ve mny ommon negors [5] even not retly onnete. Terefore, t e step, we onser prs of noes tt re ot ) retly onnete y n ege, or ) nretly (-op) onnete v ommon negors, s ntes to form supernoes. Gven lst of ot -op n -op onnete noe prs, we seek to prortze or sort ts lst n senng orer of mportne (enote y ). We lulte te normlze noe entrlty sores noeis for te noes n te se grp n ten lulte te omne mportne soror noe pr p =< u,v > s te sum of funton of normlze entrlty sores of te noes gven y ( f (noeis[u]) + f (noeis[v])). In our mplementton we use squrunton s f () s t elps n furter elyng te mergng of mportnt noes wt reltvely lesser ones. Let H e te lst of noe prs sorte y ter omne mportne sores. Also, let egeis e te mp tt mps e ege to ts mportne sore. An ege mportne sore s lulte s te normlze ege entrlty. Itertve Greey Summrzton. As sown n Algortm, we ntlly mp e noe n te se grp G to unque supernoe n te summrze grp G S. All ege onnetons etween noes n G re mntne etween orresponng supernoes n G S. At e lgortm step, we pk from H te noe pr (u,v) wt te lowest mportne sore. Unless noes u n v elong to te sme supernoe S u = S v, ter orresponng supernoes S u n S v re ollpse nto supernoe S uv. Let V Suv nte te set of noes n G elongng to prtulr supernoe S uv. We lulte te set of potentl negors η Suv of S uv n G S y fnng te set of -op negors of ll noes elongng to V Suv n G n y lultng ter orresponng supernoes n G S. For every unque potentl negor S n η Suv, were n G, we nee to ee weter onnetng S uv n S n wt superege s enefl for utlty. We ommt te esons for ll te potentl negors n senng orer of te lulte penltes. Proeure onnetsuperege(...) (pseuooe n Algortm, susse lter) returns true f te gven pr of supernoes soul e onnete y superege, or flse oterwse. For pr of supernoes ts proeure lultes ) secost: Algortm Utlty-Drven Grp Summrzton : proeure UMMARIZER(G = (V,E),Γ U ) : Intlze: utlty = ;V S = {u : {u} u V };E S = {({u},{v}) (u,v) E};S = {u : u u V } : noeis, egeis = normlze(entrlty sores(g)) : P op = {(,) (,) E,(,) E} 5: H = sort(p op ( f (noeis[])+ f (noeis[])), (,) P op ) 6: wle utlty Γ U n H / o 7: (u,v) = H.pop() 8: f S u S v ten 9: S uv = {S u S v } : V S = {V S S uv } {S u,s v } : η Suv = {S V S η, S uv } {S u S v } : for S n η Suv o : ool, penlty = onnetsuperege(s uv,s n,g,egeis) : η Suv [S n ].onnet = ool 5: η Suv [S n ].penlty = penlty 6: en for 7: η Suv = sort(η Suv (penlty)) 8: for S n η Suv o 9: f η Suv [S n ].onnet s true ten : E S = {E S (S uv,s n )} {(S n,s u ),(S n,s v )} : en f : utlty = utlty η Suv [S n ].penlty : return G S f utlty < Γ U : en for 5: onnet, penlty = onnetsuperege(s uv,s uv,g,egeis) 6: f onnet s true ten 7: E S = E S (S uv,s uv ) 8: en f 9: utlty = utlty penlty : en f : en wle : return G S = (V S,E S ) : en proeure te penlty to onnet tem wt superege, n ) nsecost: te penlty to not onnet tem. If onnetvty s eeme enefl.e., secost < nsecost ten S uv n S n re onnete troug superege. Susequently, te utlty s upte y sutrtng te orresponng penlty vlues n ll te prevous onnetons etween (S n,s u ) n (S n,s v ) re removrom G S. Ts prtulr wy of onnetvty eson mkng gues te summrzton lgortm so s to mxmze te utlty of G S t e summrzton step. Smlrly, te eson to self-onnet supernoe S uv or not s me se on utlty mxmzton: f self-loop s eeme enefl, ten self-onneton (S uv,s uv ) s e to te set of supereges E S. In te next terton, te noe pr from H wt te next lowest mportne sore s evlute. Te lgortm termntes n returns tnl G S wen te urrent utlty of G S stsfes te utlty tresol Γ U or wen ll noe prs ve een evlute. 9

6 Superege Connetvty Deson Mkng. Let us suss te etls of te proeure onnetsuperege(...), s sown n Algortm. As mentone efore, ts proeure returns true f onnetng two gven supernoes y superege s enefl n terms of utlty, or flse oterwse. Te eneft s efne s te mnmum penlty tt p (lost utlty) wen prtulr ton s performe. In our se, tere re two possle ses to evlute, ) onnetng two supernoes S u n S v y superege (S u,s v ) E S, n ) not onnetng te supernoes (S u,s v ) / E S. Note tt, te two supernoes n queston n e te sme (see lne 6, Algortm ), n ts se, we evlute n ton of self-onnetng te gven supernoe wt superege (self-loop). Let s unerstn te mpltons of e of te tons elow: Cse : (S u,s v ) E S Wen two gven supernoes S u n S v re onnete y superege, t nues ll-to-ll onneton Su,S v etween te set of se noes ontne n S u n S v (per te enong rules of Fgure ). Consequently, prt from orgnl ross eges A u,v E n te G, we re ntroung n tonl set of spurous eges { Su,S v A Su,S v } etween te set of noes ontne n S u n S v. Essentlly, t ts step, reonstruton of G from te urrent G S (s susse n Seton ) woul ntroue Su,S v A Su,S v numer of spurous eges s result of te urrent ton. Atonlly, we know tt for e ntroue spurous ege te utlty s penlze y n mount. Let secost e te totl penlty or ost (( V ) E ) ssote wt te ton of onnetng supernoes S u n S v. Cse : (S u,s v ) / E S We know tt f supernoes S u n S v re not onnete, ten we re mssng te set of A Su,S v orgnl eges. In oter wors, reonstruton of G from te urrent G S woul ve elete A Su,S v numer of eges tt exste n G. Sne te mportne sore of e ege e n G s gven y egeis[e], for e mssng ege e te utlty s to e penlze y n mount of egeis[e]. Let nsecost e te totl penlty ssote wen n ton of not onnetng supernoes S u n S v s performe. Fnlly, f secost > nsecost, ten te eneft of not onnetng te gven supernoes S u n S v s ger n ve vers. Inrementl Utlty Clulton. To urtely lulte te utlty t e terton n n nrementl fson, we nee to keep trk of ll tons n relte penltes tt ve een mpose n prevous tertons. Ts ookkeepng s explt, to vo reunnt penlzton of te utlty t e terton. For exmple, let s sy we re evlutng te ton of onnetng two supernoes S u n S v y superege. Performng ts ton equtes to te ntrouton of one or more spurous eges n te unerlyng grp etween te sets of se noes ontne n S u n S v. In prnple, we must penlze te utlty for te ntroue spurous eges. However, t my e te se tt n prevous summrzton steps, te utlty s lrey een penlzor spurous eges tt we re onserng n te urrent step. Tus, we nee to keep trk of spurous eges tt we ve penlze te utlty for t e terton. On te oter n, wen we re evlutng (S u,s v ) E S, we nee not penlzor te orgnl ross eges A Su,S v etween S u n S v. However, n prevous tertons, somnlze ton mgt ve penlze te utlty for some or ll of tese orgnl eges E. Tus, we nee to rollk te penlty of tese eges n te urrent ton. Ts ntes tt we nee to keep trk of orgnl eges s well s spurous eges tt we mgt ve penlze te utlty for n prevous tertons. Aorngly, te mount of ookkeepng neee s n te orer of O( ( V ) ). Ts lrge spe requrement mkes t mprtl to use ny kn of etermnst t struture (lst, s tle, s set, Note tt our lgortm n e mofe slgtly to prove k- nonymty gurntees [] uner fvorle ontons. A supernoe omprsng of k noes wll e k-nonymous n te supernoe omprsng of te mnmum numer orgnl noes n e onsere n nonymty lower oun. Algortm Utlty-Drven Superege Connetvty Deson Mker n Inrementl Utlty Clultor : proeure CONNECTSUPEREDGE(V Sw,V Sn,G,egeIS) : Intlze: penlty = ; secost = ; nsecost = ; eson = f lse;cf = (p,sze, f Sze); f se + = /; f se = /; f nse + = /; fnse = / ) E : totlse = ( V : for u V Sw o 5: for v V Sn o 6: f u v n (u,v) not seen efore ten 7: e = (u,v) 8: f e E n e CF ten 9: secost = secost egeis[e] : fse = fse e : else f e / E n e CF ten : nsecost = nsecost totlse : f nse = f nse e : else f e E n e / CF ten 5: nsecost = nsecost + egeis[e] 6: f + nse = f + nse e 7: else f e / E n e / CF ten 8: secost = secost + totlse 9: f + se = f + se e : en f : en f : en for : en for : f secost < nsecost ten 5: penlty = secost 6: CF.nsert((u,v)), for ll (u,v) f se + 7: CF.elete((u,v)), for ll (u,v) f se 8: eson = true 9: else : penlty = nsecost : CF.nsert((u,v)), for ll (u,v) f nse + : CF.elete((u,v)), for ll (u,v) f nse : eson = flse : en f 5: return (eson, penlty) 6: en proeure et.) for te purposes of ookkeepng. Inste, we nee more spe-effent t struture to keep trk of proesse eges. Prolst Dt Strutures to te Resue. A Bloom flter s potentl opton s t s spe-effent t struture tt n e use to keep trk of lrey proesse eges. Proesse eges mrke n te Bloom flter nte tt te utlty s een (potentlly) penlzor tese eges. As susse efore, often ertn penltes for lrey proesse eges nee to e rolle k. Ts mples tt tese eges soul e eletrom te Bloom flter n su stutons. Unfortuntely, te stnr Bloom flters o not support eleton of tems. However, ertn vrnts of Bloom flter su s ountng Bloom flter llow ot ton n eleton of tems, ut wt sgnfnt spe overe. In ft, ountng Bloom flters [9, 8] re known to use spe to retn te smlse postve rte s spe-optmze Bloom flter. Fn et l., [8] ntroue Cukoo flters (CF). CF possess te ul vntge of spe effeny s well s te lty to nle eleton of tems. Gven ter vntges, we mke use of CF to mnge te ookkeepng of proesse eges n orresponng rollks. Over-Optmsm n Utlty. We know tt prolst t strutures suffer from te prolem of flse postves.e., tey my entfy n tem s set memer even toug t s not. Cukoo flters llow tlse postve rte to e ontrolle y vryng te pty n fngerprnt sze [8]. Beuse of flse postves ntroue y CF, tere s posslty of unwrrnte optmsm n te lulton of utlty. From Algortm, we know tt f se s te set of orgnl eges lrey proesse n prevous steps s onfrme y CF, n f + nse s te set of orgnl eges tt re yet to e evlu-

7 te. Weres, fnse s te set of spurous eges lrey evlute n prevous steps s onfrme y CF, n f se + s te set of spurous eges yet to e proesse. We nlyze two spef ses. In te se were (S u,s v ) E S, we onnet te gven supernoes S u n S v wt superege. Ts ton ntroues spurous eges etween te noes n te gven supernoes. We enote ts set of spurous eges s { Su,S v A Su,S v }. Te set of spurous eges f se + tt re yet to e evlute s lulte s { Su,S v A Su,S v fnse}. We wnt to penlze te utlty for extr spurous eges tt ve een unproesse n prevous tertons. In ton, we nee to rollk penltes for te orgnl ross eges tt were proesse n prevous tertons for w te utlty s lrey een penlze. Te totl penlty secost s lulte y sutrtng te totl ost of eges n fse from te totl ost of eges f se + : ( secost = ( V ) f se + ) E e f se egeis[e] () Te urrent utlty t ts step s lulte s utlty = utlty secost. Teorem. If f pr s tlse postve rte of CF n f (S u,s v ) E S, ten we ve upper oun on utlty over estmton δ se were δ se S u,s v A Su,Sv fnse ( V ) f pr E f pr PROOF. By te efnton of tlse postve rte, we know tt f pr = f lse postves f lse postves + true negtves From ts we n erve n expresson for f lse postves n-terms of f pr n true negtves. Also, let et + se e te set of spurous eges tt re yet to e evlute, n et se e te set of orgnl eges lrey proesse n prevous steps s onfrme y etermnst t struture (e.g., Hs Tle). We know tt f + se n f se re lulte se on prolst t struture, n our se, Cukoo Flter. Terefore, te utlty over-estmton s te fferene etween secost lulte se on te etermnst n prolst t strutures. δ se = secost ( ) ( ) et se + f se + = ( V ) egeis[e] E e {etse fse } To fn te upper oun, ( we nee to fn te mxmum ) vlue of secost or mnmze e {et se fse } egeis[e]. We know tt we nee t lest one ege etween te noes n S u n S v to onnet tese supernoes wt superege. Let s onser sngle ege onnetng S u n S v n let ε e te mportne sore of ts ege. For gven orgnl grp of lrge sze, te vlue of ε n e lose to zero n we n sfely gnore t. So we ve: δ se et+ se f se + ( V ) ε = et+ se f se + ( E V ) E f lse postves true negtves ( V ) = ( E V ) f pr E f pr Here true negtves s notng ut te set of spurous eges yet to e evlute (.e., f + se ) n we know tt f + se = { Su,S v A Su,S v f nse}. Tus, we ve n upper oun for te utlty overestmton. Smlrly, n te se of (S u,s v ) / E S, we n lulte te utlty over estmton y nlyzng te ost of not onnetng ny S u, S v. (5) In summry, use of CF for te purpose of nrementl utlty lulton n result n over-optmsm euse of flse postves. However, wt te reful seleton of pty n fngerprnt sze of te CF, f p n e me suffently smll. Susequently, utlty over-estmton eomes lmost neglgle. Essentlly, nresng pty mproves te oupny of ukoo s tle weres nresng fngerprnt (ses) sze reets morlse queres, terey reung f p ut wt te vet of nrese spe overe. Tme Complexty Anlyss. Sne te lulton of mportne sores (Algortm, lne ) epens on te oe of unerlyng entrlty lgortm, we wll fous on te tme omplexty of te tertve noe mergng lgortm (lnes 6 ). In e merge step, for e potentl negor of merge supernoe O( v ), we evlute onnetvty etween merge supernoe n ts potentl negor O( V ). Terefore, te overll omplexty of e merge step omes out to O( V v ), were v s te verge egree. Lmttons. Te key lmtton of Algortm s tt t oes not sle well for lrge grps. Ts s euse noe mergng n superege eson mkng (lnes 6 ) re exustve n nture n perform reunnt omputtons. For exmple, onser Fgure 5() tt sows porton of te se grp were noes,, n re more ensely onnete to noes,,, n n omprson to noe set e, f,g. Fgure 5() sows n terton of grp summrzton were tree supernoes S = {,,},S = {e, f,g} n S = {,,,} rorme. In ts terton, supernoes S n S re evlute gnst S for onnetvty. Totl omprsons (enote y om(s,s )) re me to ee onnetvty etween S n S n omprsons re performor S n S. Also, omprsons e re me to ee self-onnetvty for supernoes S n S. So n totl omprsons re mor te se sown n Fgure 5(). However, n te next terton (Fgure 5()), we re mergng S n S to form supernoe w. In orer to evlute onnetvty etween w n S we perform reunnt omprsons etween te noes ontne n supernoe w n noes n S, tt ve lrey een performe n te prevous terton. Even to ee te self-onnetvty of w, mny (9) reunnt omputtons re performe. In totl, we ount reunnt omprsons tt oul ve een voe f we were to reuse prevous omputtons. Ts nsgt les us to more effent ppro, susse next. oter noes e f S oter noes S e f S Totl omprsons = om(s, S) + om(s, S) + om(s, S) + om(s, S) = +++ = S oter noes w=(s U S) Totl omprsons = om(w, S) + om(w, S) + om(w, w) = + 5 = 6 () () () Fgure 5: Exmple llustrtng reunnt omputtons () Porton of orgnl grp, () Porton of grp summry sowng superege eson mkng etween supernoes (S,S ), (S,S ) n self-onnetons, () Porton of grp summry sowng superege eson mkng etween supernoes (w,s ), (w,s ) n self-onnetons. Memozton se Appro To overome sllty llenges, we ntroue memozton tenque s slle ppro to. Te key gol s to ompute grp summres n perform nrementl utlty lulton y reusng prevous omputtons. Intlly, e noe n ege n te se grp G s ts own supernoe n superege n te summry grp G S. We strt y efnng tree vrles for e superege (S,S ) E S n G S : secost(s,s ), nsecost(s,s ) n (S,S ) exst. Beuse S n S re lrey onnete, te vlue of secost(s,s ) s ntle to (for ll supereges). Also, ntlly

8 wen S = {} n S = {}, not eng to onnet superege etween supernoes S n S nurs ost of egeis[(s,s )]. Terefore, nsecost for ll supereges s ntlze to te orresponng egeis[e] vlues. Weres, (S,S ) exst ntes f gven superege s permnent (wt vlue of ) or epemerl (vlue of ). An epemerl superege ntes tt we ve not ee to onnet te two gven supernoes se on te result of superege eson-mkng proess, wle permnent superege ntes te opposte. Te key vntge of n epemerl superege s tt t proves low-ost wy to store lulte penlty osts for ot onnetng n not onnetng prtulr superege. Altoug n epemerl superege s not onsere rel ege, t elps us uously re-use te pre-ompute penlty osts store n t for upomng ost omputtons. Intlly, ll te supereges re permnent, terefore, te vlue of (S,S ) exst for ll eges (S,S ) E S s set to. Intlzton of ll te superege vrles wt requre ontons s sown n Equton 6. secost(s,s ) = (,) E, nsecost(s,s ) = egeis[(s,s )] (S,S ) exst = f S = {},S = {}, (6) (S,S ) E S After ntlzton, n te upomng tertons, onnetvty osts secost n nsecost n e lulte y reusng osts lulte from prevous tertons s sown n Equtons 7 n 8. For nstne, let s sy t terton t we re evlutng onnetvty etween supernoes S u n S w n lulte utlty penlty osts secost(s u,s w ) n nsecost(s u,s w ). Let s sy we ee not to onnet S u n S w euse nsecost s less tn secost. At ts pont, te urrent utlty s lulte s utlty = utlty nsecost(s u,s w ). So n te summry grp we onnet n epemerl ege etween te gven supernoes n set (S u,s w ) exst =. In prtulr future terton t + k were k, f we wnt to lulte te ost to onnet supernoes S u n S w, ten we nee to nullfy te prevously sutrte penlty for sonnetng te gven supernoes n te terton t. Morormlly, secost(s u,s w ) t terton t + k s lulte y reusng prevous omputtons s secost(s u,s w ) t+k = secost(s u,s w ) t nsecost(s u,s w ) t. However, f supernoes S u n S w were never evlute eforor onnetvty, ten secost s lulte y estmtng te penlty for ntroung spurous eges ross te noes ontne n te gven supernoes. In smlr essene, nsecost s lulte y reusng prevously ompute vlues s sown n Equton 8. { (Su,S secost(s u,s w ) nsecost(s u,s w )}f w ) E S, secost(s u,s w ) = } (S u,s w ) exst = S u S w ( V ) f(s u,s w ) / E S E (7) { (Su,S nsecost(s u,s w ) = nsecost(s u,s w ) secost(s u,s w )}f w ) E S, (S u,s w ) exst = (8) Gven te vlues of penltes lulte n te prevous tertons for supernoe prs (S u,s w ) n (S v,s w ), we lulte utlty penltes for supernoe pr (S uv,s w ) usng Equtons 9 n. Here, S uv s te supernoe otne y mergng supernoes S u n S v. For exmple, te secost of onnetng merge supernoe S uv wt n exstng supernoe S w s lulte y ng te nvul osts of (S u,s w ) n (S v,s w ). Smlrly, we ompute nsecost of (S uv,s w ) y esly reusng nvul osts of (S u,s w ) n (S v,s w ) secost(s uv,s w ) = secost(s u,s w ) + secost(s v,s w ) (9) nsecost(s uv,s w ) = nsecost(s u,s w ) + nsecost(s v,s w ) () One we ve te nvul penlty osts of evlutng onnetvty etween merge supernoe S uv n ts potentl negors, ten te totl penlty ost of mergng ny two supernoes S u n S v s lulte y summng te orresponng nvul osts of evlutng onnetvty of S uv wt ts potentl negors. Equtons n sow te lultons. secost(s uv ) = secost(s uv,s w ) () w ηm m {S u S v} nsecost(s uv ) = w ηm m {S u S v} nsecost(s uv,s w ) () Fnlly, te utlty penlty or osts ssote wt merge supernoe S uv s self-onnetvty eson mkng n lso e lulte y ng pre-ompute osts of S u s self-onnetvty (S u,s u ), S v s self-onnetvty, n te ost ssote n evlutng supernoe pr (S u,s v ). Clultons re sown n Equtons n. secost(s uv,s uv ) = secost(s u,s v ) + secost(s u,s u ) + secost(s v,s v ) () nsecost(s uv,s uv ) = nsecost(s u,s v ) + nsecost(s u,s u ) + nsecost(s v,s v ) () In summry, gven newly merge supernoe S uv n ts potentl negor S w, to evlute onnetvty etween tem, we reuse prevous omputtons etween S u,s w n S v,s w s oppose to reunntly performng omprsons etween se noes ontne n S uv n S w, s one n te prevous ppro (Seton.). As sown n Algortm (lnes 6), we o ts for ll potentl negors. As result, y vong reunnt omputtons, we ve effetvely reue te omplexty of e merge step from O( V v ) n te prevous ppro, to O( v ) n te urrent ppro. Also, y storng penlty osts (secost n nsecost) for e superege n usng te onept of epemerl eges we ve ntroue n extremely low-overe wy to keep trk of te penlty osts for ll te prs of supernoes weter ee to onnet tem or not. Tese store penlty osts re use to effently lulte osts for upomng omputtons. Dsusson: Wle memozton reues te tme omplexty of e mergng step, te tme omplexty of omputng te mportne sores n e stll g. We mprove te performne of ts step y mkng use of tst pproxmton lgortms for entrlty lulton. For exmple, onserng etweenness entrlty se utlty funton, we mke use of n ppro tt uses rnom smplng of sortest pts to estmte entrlty vlues for ll te noes/eges []. Algortm runs n te orer of O( E ) per smple n nterestngly, te numer of smples neee to ompute goo pproxmton to ll vertes s onstnt n nepenent from G. Fnlly, we note tt te tenques propose n ts pper re not ust lmte to un-rete, n un-wegte grps. For nstne, lulton of mportne sores n e esly pte to rete/wegte grps s entrlty omputng lgortms exst for rete, wegte grps s well. On te oter n, n te groupng step, noe pr ntes to merge t e step n lso e pke se on retons. For nstne, f n rete grp we ve rete eges ( ) n ( ) ten n n e one su nte pr to merge. Also, sne our utlty funton epens on lulton of mportne sores for noes n eges, t nturlly pts to wegte grps. 5. EXPERIMENTAL EVALUATION 5. Expermentl Settngs Setup. We perform ll our experments on sngle Amzon EC m.xlrge nstne wt 6 vcpu, 6 GB memory, n GB SSD storge. We use Pyton n rete grps usng te Networkx [6] lrry. For ertn slle entrlty mplementtons we rely on te networkt [] lrry. To slor te lrge tsets tt rely ft n te memory, we me severl progrmt mprovements to our oe. For exmple, we refully prllelze te loop

9 (lnes 6, Algortm ). Also, we mofe Networkx lrry to support externl memory grp ess (re n wrte). Speflly, we exten Networkx y sulssng te Grp lss n provng user-efntory funtons. Tesuntons query tse n e te results n te tonres use y Networkx. Dtsets. In our experments we mke use of seven rel-worl unrete n un-lele grp tsets. Among tem, -GrQ, -AstroP, -HepT, n -HepP re utor ollorton networks from te e-prnt rxv for Astropyss, Hg Energy Pyss, Hg Energy Pyss Teory, n Generl Reltvty tegores. Te tset om-amzon s onneton etween ny two prouts f tey o-purse. Weres LveJournl n Frenster re onlne loggng n gmng networks. All te tsets n e ownloe from [9]. Tle presents te tsets n ter propertes su s sze, verge egree (Avg. Deg.), ensty, verge lusterng oeffent (Cl. Co.), numer of onnete omponents (CCs), n sze of lrgest omponent (LC). Tle : Rel worl grp tsets Dtset Noes Eges Avg. Deg. Densty Cl. Co. CCs LC -GrQ % % -HepT % % -HepP % 78 9.% -AstroP 8, % % om-mzon %.967.% om-lvejournl % % om-frenster e-5% % Bselnes. We losely stuy two key works n lterture tt prove tertve solutons for groupng-se greey grp summrzton. Frst s te work y Nvlk et l. [5] n seon s y Tn et l. []. Beuse [] uls on [5] n proves te strute soluton for t, we mplement lgortm susse n [5] s selne. We ve e expermentl results n Seton 5. (Fgure 7) omprng our results wt stte-of-te-rt groupng-se summrzton tenque y Nvlk et l. [5]. Aorng to ts tenque, te est pr of noes s selete t e step on te ss of mxmum gn. Gn s efne s te extent of ompresson eve wen te selete pr of noes re merge. To sle ts tenque, utors selet noe u t rnom n negor v wtn -ops s selete tt eves mxmum gn wen merge wt u. Ts s repete untl te requre ompresson s eve. Sne ts tenque s se on te teory of Mnmum Desrptve Lengt, we refer to ts tenque s n our experments. Next, we glgt te key esgn esons tt we me n our tenque n reple e esgn eson wt ts rnom ounterprt to rete our oter set of selnes. We mke two key esgn esons n our tenque; frst, we ompute reltve mportne sores for noes n eges usng te sortest pt etweenness entrlty metr. Seon, we selet pr of -op negors n te senng orer of te sum of ter mportne sores. We rnomze tese key steps y ) rnomly ssgnng mportne sores to noes n eges (), ) seletng te pr of -op noes n rnom orer () wle ssgnng mportne sores usng etweenness entrlty, n ) performng ot steps rnomly (-). For rnom selnes we report te verge of ten runs. Evluton Metrs. We evlute our tenques usng two populr rel-worl ppltons, mesure te pplton-spef utlty, n ompre wt our selnes. For e pplton, we efne utlty metr tt wll nte te usefulness of grp summry wt respet to te orresponng pplton. Applton : Top-k Query. One of te wely use rel-worl ppltons s te seleton of top-k or top t% of noes, were te gol s to rnk noes usng te Pgernk lgortm n selet te top k noes orng to ter rnks, n esenng orer. Gven te vlue of t, k s erve s k = V t% for G. Weres, for G S, k = V S t%. If we run Pgernk on ot grp G n ts summry G S n V t% e te set of top-k noes n G se on Pgernk vlues, Tle : Exmple test rter Desrle Property Exmple Test Crter C: Let s onser grps m X C n,m Y C n, n m Z C n (X < Y < Z) from Fgure. Let m Z C n e te se grp n m X C n n m Y C n e perture grps otne y ntroung X n Y numer of spurous eges to m Z C n. Ten orng to C: spurous ege wreness rteron, te followng onton soul stsfy: ( EU(mX C n ) mz C n EU(m Y C n ) mz C n ) > (5) C: Conser wegte rell grps w s B n, w t B n n mb n. Here w s B n s rell grp of sze n wt wegt of extly one of te eges eng s, n te wegts on te rest of te eges eng r, were s > r. In ts se, let mb n e rell grp wt remove evy-wegte ege. If s > t, ten orng to C: wegt wreness rter, te followng soul stsfy: (EU(w t B n ) mbn EU(w s B n ) mbn ) > (6) C: Conser grps K n,mk n n C n,mc n from Fgure. Tesour grps re eqully sze n terms of numer of noes, were C n s reltvely fewer eges wen ompre to K n. Grp mk n s otne y removng sngle egrom K n, smlrly mc n s otne y removng sngle egrom C n. Ten, orng to C: ege sumoulrty rter: (EU(mK n ) Kn EU(mC n ) Cn ) > (7) ten te utlty of G S s efne s: Top-k Query App Utlty = v V k S v k (8) In oter wors, f ll te top k or t% noes from G mt extly wt top-k noes n G S ten te utlty sore n ts se equls were e noe ontrutes to te summton n te numertor for Equton 8 s for tt noe S v =. On te oter n, n te se were some of te top-k noes re ontne wtn supernoe ontnng more tn one noes, ten e su noe u ontrutes vlue of S u. Ts frton (tt s < ) represents te nformton loss use y te summrzton proess. Applton : Lnk Preton. Anoter rel-worl pplton s knowng f gven pr of noes elongs to te sme ommunty, or not. In oter wors, se on te urrent ommunty struture, pretng f tere wll e lnk etween te gven pr of noes, or not. To mesure te utlty of G S, we onser lst of ll prs of -op noes n grp G. For e pr, we pret lnk f te pr elongs to te sme ommunty n G S, n we ompre te result wt te lnk preton on G. Morormlly, f L S s te nry lnk preton result vetor for G S, were e element orrespons to lnk preton result for pr elongng to ll -op prs, n f L e te result vetor, ten utlty of G S s efne s: Lnk Preton App Utlty = L S L L (9) Exmple Crter for Desrle Utlty Funton to Stsfy. In ton to te exmple test rter esre n Seton, ere we prove lst of more exmple rter sown n Tle. Tese exmple rter se on moel grps n Fgure elp us unerstn te propertes efne n Tle, n evlute te esrlty of utlty funton. Note tt tese rter re not exustve n oter rter n e evse usng te moel grps n Fgure. 5. Expermentl Results Current-flow n Sortest Pt Betweenness Centrlty-se Utlty Funton Stsfes All Desre Propertes. We strt y evlutng te sutlty of vrous entrlty metrs tt n e use urng te lulton of ege mportne sores, n form utlty funton tt exts te esre propertes esre n Seton. Generlly, te reltve mportne of e ege n te grp G s

10 Top-k Query App Utlty.. Dtset: -GrQ Top % noes selete -.5. Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepT Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepP Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -AstroP, Top % noes selete Reuton n Noes (RN) Dtset: om-amzon Top % noes selete Reuton n Noes (RN) () () () () (e) (f) (g) Top-k Query App Utlty Top-k Query App Utlty Dtset: om-lvejournl Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: om-frenster Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -GrQ Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepT Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepP Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -AstroP Top % noes selete Reuton n Noes (RN) Dtset: om-amzon Top % noes selete Reuton n Noes (RN) () () () (k) (l) (m) (n) Top-k Query App Utlty Top-k Query App Utlty Dtset: om-lvejournl Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: om-frenster Top % noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -GrQ Top 5% noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepT Top 5% noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -HepP Top 5% noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: -AstroP Top 5% noes selete Reuton n Noes (RN) Dtset: om-amzon Top 5% noes selete Reuton n Noes (RN) (o) (p) (q) (r) (s) (t) (u) Top-k Query App Utlty Top-k Query App Utlty Dtset: om-lvejournl Top 5% noes selete Reuton n Noes (RN) Top-k Query App Utlty Dtset: om-frenster Top 5% noes selete Reuton n Noes (RN) Lnk Preton App Utlty Dtset: -GrQ Lnk Preton App Utlty Dtset: -HepT Lnk Preton App Utlty.. Dtset: -HepP Lnk Preton App Utlty. - Dtset: -AstroP Lnk Preton App Utlty.. - Dtset: om-amzon Lnk Preton App Utlty.. Dtset: om-lvejournl Lnk Preton App Utlty Dtset: om-frenster Reuton n Noes (RN) (v) Reuton n Noes (RN) Reuton n Noes (RN) Reuton n Noes (RN) Reuton n Noes (RN) Reuton n Noes (RN) (w) (x) (y) (z) () Fgure 6: Expermentl results emonstrtng effetveness of esgn esons Reuton n Noes (RN) () ssesse y mesurng te egree of prtpton of eges n ommunton etween stnt prts of te network. Ts les us to te noton of etweenness entrlty. Te most ommon etweenness entrlty metr s se on sortest pts, were te entrlty of n ege e s essentlly n verge numer of sortest pts onnetng ll prs of noes n te grp tt pss troug ege e. Tere re some rwks wt ts ppro. Frst, t tkes nto ount only te sortest pts n gnores te slgtly longer pts. Eges of su reltvely longer pts re rtl for ommunton n te network. Seon, te tul numer of sortest pts tt le etween te soure n estnton s rrelevnt. In our se, t s resonle to onser te unne n te lengt of ll pts. Knowlege out te mportne of e ege to te grp struture s enne wen more routes re possle. In orer to tke su pts nto ount, Current-Flow Betweenness entrlty n e onsere []. Here, te grp s mgne s resstor network n w te eges re resstors n te noes re untons etween resstors. Aorngly, te urrent-flow etweenness of n ege s te mount of urrent tt flows troug t, verge over ll soure-estnton prs, wen one unt of urrent s nue t te soure n te estnton (snk) s onnete to te groun. Let s enote sortest-pt n urrent-flow etweenness entrlty-se grp utlty funtons s SP-BCU n CF-BCU. Moreover, ertn entrlty metrs lulte entrlty sores for noes n nnot e retly use to lulte ege entrlty e.g., entrlty metrs tt re se on Pgernk, Egenvetor [7], Communlty [6], Communlty Betweenness [7], et. Here, we tret te noe entrlty sores s noe mportne. Also, y ntuton, we ssgn mportne sores to eges se on te mportne of te noes tey re onnetng to.e., we ssgn n ege g mportne f t onnets ny two gly mportnt noes. Usng tese noe-se entrlty mesures, we estmte n ege mportne y summng up te normlze entrlty sores of te pr of noes t onnets, n normlzng t. Let s enote utlty funtons se on tese entrlty metrs s PRU, EVU, COU, n CO-BCU. We ompre tese utlty funtons n evlute ter effetveness usng te moel grps (sown n Fgure ) n our el utlty funton propertes. Tle 6 emonstrtes our evluton results. Re ells or non-postve vlues nte volton of orresponng property or rteron. Results sow tt CF-BCU n BCU oey ll torml requre propertes (C-C). Bol vlues represent mx vlues tt re gly srmntory for e test rteron. Wn CF-BCU to e most effetve n gly srmntory. E row of te tles orrespons to omprson etween te smlrtes (or stnes) of two prs of grps; prs (A,B) n (A,C) for property (C-C); n prs (A,B) n (C,D) Tle : Prtlty of utlty EU wt respet to n pplton of top-k query Applton -GrQ -HepT -HepP Dtsets -AstroP om-amzon om-lvejournl om-frenster Top % Noes Person s r Cos. Sm. Person s r Cos. Sm. Person s r Cos. Sm. Person s r Cos. Sm. Person s r Cos. Sm. Person s r Cos. Sm. Person s r Cos. Sm