Identifying Inflencers in Socil Networks Kshl Dve IIIT-Hyderbd Hyderbd, Indi kshl.dve@reserch.iiit.c.in Rshi Bhtt Yhoo! Lbs Bnglore, Indi rshi@yhoo-inc.com Vsdev Vrm IIIT-Hyderbd Hyderbd, Indi vv@iiit.c.in Abstrct The centrl ide in designing vrios mrketing strtegies for online socil networks is to identify the inflencers in the network. The inflentil individls indce word-of-moth effects in the network. These individls re responsible for triggering long cscdes of inflence tht convince their peers to perform similr ction (bying prodct, for instnce). Trgeting these inflentils slly leds to vst spred of the informtion cross the network. Hence it is importnt to identify sch individls in network. One wy to mesre n individl s inflencing cpbility on its peers is by its rech for certin ction. We formlte identifying the inflencers in network s problem of predicting the verge depth of cscdes n individl cn trigger. We first empiriclly identify fctors tht ply crcil role in triggering long cscdes. Bsed on the nlysis, we bild model for predicting the cscdes triggered by ser for n ction. The model ses fetres like inflencing cpbilities of the ser nd their friends, inflencing cpbilities of the prticlr ction nd other ser nd network chrcteristics. Experiments show tht the model effectively improves the predictions over severl bselines. Introdction The rpid development of socil networks sch s Fcebook, Flickr, Twitter, Linked-In on the Internet hs reslted in socil inflence emerging s complex force, governing the diffsion of the inflence in the network. The emergence of socil inflence hs llowed vrios compnies to look beyond direct mrketing to find potentil cstomers to trget. The rich neighborhood informtion tht socil network provides bot ser, cn be leverged to mke intelligent mrketing decisions. Virl mrketing involves identifying potentil cstomers who cn leverge their socil contcts to inflence their friends to perform certin ction (sch s clicking n d). One wy to qntify the inflence exerted by these individls is to predict the verge length of cscde they trigger mong their friends for certin ction. Once these individls re identified they cn be trgeted to chieve lrge cscdes nd hence wide rech. We try to tckle this problem of predicting verge cscdes triggered by n individl for given ction sing mchine lerning pproch. Copyright c, Assocition for the Advncement of Artificil Intelligence (www.i.org). All rights reserved. Most of the existing reserch work qntifies the inflence s fnction of the inflencing cpbilities of the trget ser only. Idelly, it shold lso depend on how ssceptible friend is to getting inflenced by the trget ser. Indeed, friend with lesser ssceptibility to inflence my minify some of the inflence from the ser. Besides, the depth of the cscde lso depends on the prticlr ction to be propgted in the network. For exmple, trget ser my inflence friend to click on n d on bying footbll mtch ticket, bt she my not be ble to convince to click on n d on IPhone. Arl et l. () tlk bot how incorporting virl fetres in the prodct (ction) cn indce peer inflence in the network. This phenomenon of triggering of cscde for n ction s fnction of the trget ser, their friends nd the ction itself is well explined by Wtts nd Dodds () s - The triggering of cscdes in network hs nmeros nlogs in ntrl systems. For e.g, some forest fires re mny times lrger thn verge; yet no one wold clim tht the size of forest fire cn be in ny wy ttribted to the exceptionl properties of the sprk tht ignited it or the size of the tree tht ws the first to brn. Mjor forest fires reqire conspircy of wind, tempertre, low hmidity, nd combstible fel tht extends over lrge trcts of lnd. From the bove discssions it is cler tht the concept of n inflencer vries bsed on the prticlr ction performed by tht ser. Tng et l. () show tht sers hve vrying degree of inflences for different topics. Ths, the problem of identifiction of inflencers shold lso tke the prticlr ction into ccont while identifying the inflencers. Bsed on these fondtions, we first nlyze how fctors like ser s nd their neighborhood s inflencing bility nd ction poplrity ffect cscde t ser for n ction. Next, we bild regression model tht predicts the verge cscde triggered for n ction by ser. The model ses fetres like inflence cpbilities of the trget ser nd his/her friends, how prone the friends re to getting inflenced by the trget ser, the inflencing cpbility of the ction, nd other network nd ser chrcteristics. In ntshell, the mjor contribtions of the pper re: We empiriclly try to find if, prt from the ser s inflence bilities, other peripherl fctors like the poplrity of the ction, inflence bilities of the ser s neighborhood lso ply role in the spred of the contgion.
We propose novel method to identify individls who cn led to lrge cscdes of informtion in socil network sing predictive model. We se socil grph generted from Flickr for or experiments. The dt hs bot M sers, bot B ction events nd some K odd distinct ctions. Relted work The work most relevnt to or proposed method is by Richrdson nd Domingos (; ), Kempe et l. (), Hrtline et l. () nd Leskovec et l. (). Richrdson et l. () present probbilistic model to mine the network vle of ech individl bsed on the inflence exerted by the ser on her neighbors. They show tht ser s network vle nd her intrinsic vle cn be combined to mke optimized mrketing decisions. Kempe et l. () propose greedy hill climbing strtegy for picking top-k inflentils which works better thn certin network heristics like degree-centrlity. Chen et l. () improve the rnning time of the greedy lgorithm nd propose certin degree-discont heristics tht improve the inflence rech. More recently, Bkshy et l. (), nlyze the propgtion of inflence in twitter nd explore vrios mrketing strtegies governed by the cost of identifying the inflencers. The problem of identifying tolentils hs been modeled in vrios other wys. Leskovec et l. () formlte it s problem of otbrek detection, while, Hrtline et l. () pose it s problem of revene mximiztion. In most of the previos work, the nlysis of inflence propgtion ws confined to the sers nd their socil neighborhood. In this work, we try to find if the prticlr ction tht propgtes lso plys role in the propgtion. Recently, there hs been some work on estimting the inflence probbilities (probbility of ser inflencing others). Tng et l. () rge tht the inflence exerted by n individl vries cross topics. Goyl et l. () otline vrios sttic nd dynmic models for estimting the inflence probbilities. Sito et l. () employ expecttionmximiztion (EM) lgorithm to lern the inflence probbilities for the independent cscde model. Singl et l. () go on to show tht people who re connected often shre their interests nd personl chrcteristics, which proves the existence of homophily in socil networks. Angnostopolos et l. () otline timestmp shffling test to ssess if the socil network exhibits significnt inflence effect. Concrrently, Arl et. l (9) come p with dynmic mtched smpling estimtion frmework tht identifies both homophily nd inflence effects in socil network. Fond nd Neville () propose tht inflence effects re conseqence of chnge in ser ttribtes nd homophily is present in the network if the network strctre chnge over time. Ch et l. () stdy the informtion dissemintion in socil grphs generted from Flickr dt. Bhtt et l. () bild model to predict the ftre doption of the PC to Phone prodct for the Yhoo! IM network. Problem Formltion In this section, we present the problem formltion nd introdce certin terminologies. Consider set of sers in socil network, connected with some reltion R. The notion of R vries cross contexts, sy, in socil networks sch s Flickr, Fcebook, Linked-In or Twitter, the reltion cn be - being friend/follower, while in n Instnt messging environment reltion cn be interction between the sers. The reltion R cn be represented by n ndirected grph (U, E), where U is the set of sers nd E is the set of edges: i,j (, j ), where (, j ) exists if nd only if, j re connected by reltion R. In ddition, ech ser hs fetres sch s ge, gender, no. of ctions performed etc. For ech ser U, we represent the set of fetres s X = {x, x,..., x n}. Ech ser performs certin ction A t time t. Action definition my vry from context to context, for exmple clicking on n d, bying prodct online etc. In this pper, we consider ction s joining grop on Flickr. With ech ction A, we hve set of fetres S = {s, s,..., s m}. Next we define the notion of ction propgtion. Action Propogtion ( j ): An ction is sid to propgte from ser to ser j, if following holds: (). nd j re connected with reltion R, tht is, (, j ) E. () User performs ction before ser j, tht is, t < t j. () Action from ser j shold follow within certin time intervl fter performs the ction, tht is, (t t j ) < τ. The time constrint for ction propgtion ((t t j ) < τ), is kept in order to hve tighter bond on the credit given to ser for propgting ction to ser j. This follows inline with the findings of Angnostopolos et l. () who vil the evidence of temporl clstering to corroborte the clims of peer inflence. The significnce of τ cn be explined by the fct tht fter performing ction t time t i, if ser cn not propgte the ction to j (tht is, mke ser j perform ction ), within time τ, it becomes non-contgios fter τ with respect to ction. After time τ even if the j performs ction, it is not credited to ser. Ech ser fter performing prticlr ction, becomes contgios for time τ nd tries to propgte the ction to its neighbors in the socil grph. Sy, fter time t < τ, scceeds in propgting the ction to some of its neighbors, these neighbors in trn become contgios nd try to propgte the ction to their neighbors nd so on. This leds to chin of ctions (referred to s cscdes), initited t ser for prticlr ction, propgting cross its neighbors within few hops. These ction propgtions from one ser to other cn be well represented in the form of directed cyclic grphs. In this pper, we refer to sch grphs s ction grphs. Action Grph: An ction grph for ction, G = (U, E ) consists of set of Users U who hve performed ction t some point of time nd set of directed edges (, j ) sch tht ction ws propgted from node to j. Propgtion set P (): Ech ser hs propgtion set P consisting of ll the immedite neighbors of n socil grph sch tht there ws n ction propgtion from to. Formlly, P () = { } Smple grphs for n ction initited t sers nd re shown in the Figre. Identifying the set of sers who cn trigger lrge cscdes for prticlr ction is of gret
() 9 (b) Figre : Smple ction grphs for n ction interest in vrios contexts. For exmple, n dvertiser might wnt to strt showing prticlr d to these specil set of identified individls who cn promise more rech throgh her neighbors. The rech of ser cn be qntified by the verge nmber of cscdes triggered by ser for n ction, which leds s to the definition of rech. Let rech () is the rech of ser for n ction, nd cn be recrsively defined s follows: + rech ( j ) if P () rech ()= P () j P () otherwise A ser gets credit of for n ction propgtion to its immedite neighbor. The significnce of ( ) times the rech of the descendnts of ser n the ction grph cn be nderstood s ssigning decying credit s we move frther from the node n the ction grph. For exmple, in Figre (), gets the complete credit for the propgtion of ction to nd, while gets credit of. times ( ) the rech of nd, nd. ( ) times the rech of nd so on. User gets credit of for ll the immedite neighbors in the ction grph (i.e ll the members of P ( )) So, the overll rech of ser for ction is = ( +.( + (. )) + +.()) =.. While conting the rech of node, we only consider rech of the descendnt nodes only once, if tht node is encontered gin throgh some other pth we discont it. For exmple, in Figre (b), while compting rech ( ), the edge 9 will not be conted, while rech of 9 will be conted only once s 9 hs lredy been considered throgh edge 9. The overll rech of ser is = ( + +.() + +.()) =. While clclting the rech, the pths re considered bsed on the timestmp t which they were dded to the ction grph. The pths re trversed in the scending order of timestmp. In Figre (b), the descendnts of were dded to the ction grph in the following order:, 9,. Hence these nodes will be trversed in the sme order. Identifying the smll set of sers who cn elicit greter rech for n ction cn be formlted s the problem of predicting the rech for ech ser nd for prticlr ction. Now we cn formlly define the problem s: Problem: Given socil grph nd pst ction events, ccrtely predict the rech of ech ser for prticlr ction (rech ()). Any pproprite vle wold hve sfficed, we propose to se ( d ) for depth d Dtset To pply or frmework, we se Flickr socil network dt for the experiments. The dtset is longitdinl combintion of the following for dtsets : () User dt (X), which contins informtion bot the Flickr sers ()Contcts dt (, j ): This dt gives the friends informtion. We se this dt to bild the socil grph. () User-grop membership (,, t ): contins informtion bot ser joining prticlr grop nd the time of joining the grop. () Grop dt (S): tells vrios detils bot prticlr grop sch s nmber of members, topics for the grop. Algorithm Compting the Action Grph : Inpt: C: (, j ), A: (,, t ) : Otpt: G = (U, E ) : for ech tple (, j ) in C do : : for ech ction event (,, t ) in A do if entry for & j exists in A then : if (t t j ) < τ then : dd nd j to U ; : dd directed edge ((, j )) to E ; 9: else if (t j t ) < τ then : dd nd j to U ; : dd directed edge (( j, )) to E ; : end if : end if : end for : end for The socil grph bilt from the bove dt contins O(M) sers nd O(M) edges. The ction grph for the ctions is bilt sing dt () nd () s described in Algorithm. The ction grph withot the τ constrint contins O(B) edges. Figre () shows the CDF for the propgted ctions in the dtset. As it cn be seen, the drtion of propgtion for some ctions is even greter thn months. Figre (b) shows the freqency of the ctions propgted within weeks. It shold be noted tht the x nd y xis in Figre (b) re log scled. As shown, it shows n exponentil decy with time. The til fter weeks till months is qite long (not shown in the Figre (b)). The exponentil decy cn be ttribted to the fct tht when ser performs n ction, her friends re more likely to dopt s they feel n rge to do the ction nd with time the rge my mitigte. Hence, if ser performs the sme ction s its peer fter sbstntilly long time, there is good chnce tht the ser performed the ction jst becse they hve common interests (Homophily). In order to confidently ttribte the ction propgtion to peer inflence, we keep the τ vle to one week, which gives s better bond on the inflence. A similr pproch of keeping time constrint to distingish peer inflence from homophily hs been sed before in Goyl et l. () s well. For ech ser -ction pir, we compte the rech (), if hs ever performed ction. Figre (b) is rescled to preserve dt confidentility
.9...........9. mn mn hr hr D W mo mo mo mo mo mo mo Drtion between ctions () (b). #(Friends of ) * (c) 9 Figre : () CDF for the ction events in the dtset. (b) Freqency of ctions propgted within two weeks. (c) Rech v/s nmber of friends of Fctors Affecting Cscdes In this section, we nswer the qestion - wht re the fctors tht cn ply role in determining the rech () for ser nd ction. In prticlr, we consider vrios serlevel, socil neighborhood level nd ction level fctors in the following sbsections. User Level Fctors We first stdy, how rech () of chnges with the nmber of friends of n the socil network. Intitively, more the nmber of friends ser hs, better re her chnces of propgting the ction to the next hop. Figre (c) shows rech () s fnction of nmber of friends of. The rech () vles re verged in the prticlr bins. For exmple, vle on x-xis represents verge rech for ll the sers hving nmber of friends less thn or eql to, bt greter thn. As shown, the rech () increses s the nmber of friends of ncrese. This is expected s high degree sers hve better opportnity of propgting the ction s compred to low degree sers. Next, we nlyze how rech () vries with respect to the inflencing cpbility (inflence probbility) of. It is the rtio of nmber of times n ction ws propgted from to its tlest one of its immedite neighborhood by the totl nmber of ctions performed by ser. =, I( ) #{ctions by } where I is n indictor fnction tking vle, if there ws ction propgtion for ction by ser to tlest one of the neighbor. Idelly, more the inflence probbility of ser better shold be its rech. As shown in Figre (), rech () increses monotoniclly with the inflence probbility of the ser. In ddition, we refer to the extent to which ser is prone to getting inflenced by others s the prone probbility of ser ( ). It is the rtio of the nmber of times did n ction nder inflence by the totl nmber of ctions performed by. I( ), = #{ctions by } The prone probbility cptres the ssceptibility of ser to peer inflence. Figre () shows the verss the rech () grph (red line). The rech increses proportionlly till reches. nd fter tht we find grdl improvement. Socil Neighborhood Fctors In this section, we explore the role of ser s socil neighborhood in determining her rech. We consider the inflence probbility nd of ll sers in the immedite neighborhood (hop one) nd t second nd third hop levels. The motivtion behind nlyzing these fctors is to see if the neighborhood ser s inflence probbility, contribtes to rech (). For ech ser, the inflence probbility of s neighborhood t hop k, :hopk, is the verge of for ll sers t hop level k from. Mthemticlly, :hopk = :hopk #{ t hop k from } Figre (b), (c) nd (d) plots rech s fnction of :hopk for k=, nd respectively. As before, the rech () vles re verged in tht prticlr bin. As shown, the neighborhood inflence probbilities increse the rech () increses monotoniclly. As with inflence probbilities, we lso consider the prone probbilities of the socil neighborhood p to hop levels from ser. :hopk is given by :hopk = :hopk #{ t hop k from } The ide behind considering the prone probbility of the neighboring sers is tht more ssceptible the sers in neighborhood to peer inflence, better re the chnces of the cscdes incresing frther. Figre (b),(c) nd (d) shows the :hopk verss rech () plot for k =, nd. The sdden decline in the rech () vle (for vles fter.) cn be ttribted to the fct tht there were very few vles greter thn. to hve confident estimte of rech (). We hypothesize tht if the inflence probbility is more thn certin threshold, we deem tht fctor s ctive nd
....... 9 Inflence Prob. (*.) () 9 Inflence Prob. (*.) Inflence Prob. (*.) (b) :hop (c) :hop Figre : Effect of inflence probbility on the rech of ser t hop level (one, two, three) Prone Prob. (*.) ().... 9 Prone Prob. (*.) Prone Prob. (*.) (b) :hop (c) :hop Figre : Effect of inflence prone probbility on the rech of ser t hop level (one, two, three)... 9 Inflence Prob. (*.) (d) :hop... Prone Prob. (*.) (d) :hop Active Avg. log(rech) Avg. log(rech) Fctors for for Only..9 +h.. +h+h.. +h+h+h.. Tble : Rech increses s neighborhood inflence nd prone probbilities cross the threshold sy tht it is contgios. For exmple if the P inf vle is less thn., we consider the ser to be inctive. On the other hnd, if the P inf exceeds., the ser is considered to be contgios (ctive). For ll the ser, ction nd neighborhood inflence probbilities (P inf, P inf, P inf :hopk ), the threshold is set to.. Similrly, if the prone probbility is less thn certin threshold, we hypothesize tht the fctor is not ssceptible to peer inflence (Inctive). In or cse, if the ser (P prone ) or the neighborhood prone probbility (P inf :hopk ) is less thn., we sy tht it is inctive, otherwise we consider the ser/neighborhood to be ctive. We choose to se the verge vles of P inf nd P prone s threshold. However, the reslted presented next were similr for other vles of thresholds (.,., nd.) for P inf nd (. nd.) for P prone. Tble confirms or hypothesis, where row corresponds to only the ser being ctive (both in terms of & ). Row corresponds to the event tht only the ser nd hop neighbors re ctive (while hop nd hop neighbors re Inctive) nd so on. As shown, s the neighborhood becomes ctive, the rech for ser ncreses (colmn ). Prone probbilities (colmn ) t ech hop show similr trend, the rech increses s the neighborhood t ech hop becomes ssceptible to peer inflence. Active Avg. log(rech) Fctors for Only. Only. +. + + ll hops. Tble : Rech increses s ser, ction nd hop become ctive in conjnction Action Level Fctors As with the ser level nd socil neighborhood fetres, we expect the rech to increse with the poplrity of the ction. One wy to ssess the poplrity, in or cse, is by conting the nmber of sers doing tht ction (cont of sers joining the grop). Figre () shows the cont of sers doing ction verss the rech () plot. To find how inflenceble the ction is, we define ction s inflencing bility ( ) s the rtio of nmber of sers doing ction nder friend s inflence by the totl nmber of sers doing ction. I( j ) = i #{sers doing } Figre (b) shows rech () s fnction of nd confirms the intition tht s the ction becomes more inflencing the rech for the ction lso increses. Next, we check if the ction level fctors combined with the ser nd socil neighborhood fctors hve ny impct on the rech vle. As before, we fix on threshold (.) nd if is greter thn the threshold, we sy tht the ction is contgios (ctive). Tble nlyses the impct on rech s the ser, ction nd the neighbors become ctive. Row gives the verge rech vle when only the ser is ctive ( >=. nd <.). In row, only the ction is
... K K K K 9 #(Users doing ) () #(ser doing ) v/s rech () Figre : Action level fctors.... P inf (b) Action Inflencebility v/s rech () ctive, while in row, both ser nd ction re ctive nd so on. This shows, ll the fctors, when ctive in conjnction, cn increse the rech () vle frther. Experimentl evltion Bsed on or nlysis in the previos section, we propose soltion to the problem posed of predicting the rech vle for n ction nd ser. In this section, we describe the trintest split, model nd fetres sed. Trining nd Testing From the dt, we hve ser-ction pir nd the observed rech () vle. We compte the log of the rech () nd se it s the rech vle. For ech ser-ction pir the gol is to predict the rech of the cscde, s if we did not know bot the cscde event. Ech entry in or dtset consists of the following tple, (,, F, ). where F is the set of fetres described erlier. All the fetre vles in F re compted on the ction log bilt till time M. We se the dt from time M + onwrds for the experiments. The ide is to lern from the pst ser nd ction behviors to effectively predict the rech in ftre for ser ction pir. In or cse ser only performs n ction once, hence we test the model on (, ) sch tht did not perform ction erlier in time M. In cses, where ser cn perform the sme ction more thn once (for e.g. clicking on n d), the model cn lso be sed to predict the rech for the sme ser-ction pir. We split or dt into rtio :: for trining, testing nd vlidtion respectively. We ensre tht these sets re non-overlpping w.r.t the ctions, tht is, ll the tples (,, F, ) hving ction will go into either of the trining, test or vlidtion set. As or gol is to predict rel vled nmber (rech ()), we cst it s regression problem. We se Grdient boosted decision trees (GBDT) s regression model for predicting the rech vles. The GBDT prmeters, nmber of trees nd nmber of lef nodes per tree were set to nd respectively. We se the men sqre error (MSE) s or primry mesre of performnce. This metric is the men sqred error between the models predicted rech () vle nd the ctl (or observed) rech vle. We lso se KL-divergence s the other performnce mesre. The improvements in the model re reported on the MSE metric. We se two bselines to compre or model: Bseline-: This bseline is the verge rech of the ser cross ll the ctions in dt before time M. rech i () rech () = i #{ctions by } Bseline-: This bseline is the verge rech of the ction(grop) for ll the sers in the dt before time M. rech ( ) rech () = i #{sers doing ction } Fetres The fetres sed in by the model re described in Tble. Aprt from the fetres listed, we consider log(f+) s dditionl set of fetres for ll nmeric fetres f. We lso hd n lwys on fetre set to. In ll, the model ses fetres. In the ser level fetre set, prt from inflence nd prone probbility, we lso consider the verge inflence probbility of the ser cross ll its friends nd ll the ctions performed. vginf() = #{ctions propgted} n friends n ctions The socil grphs nd the ction logs cn be effectively sed to mesre the importnce of ser in the network. Specificlly, one cn leverge the HITS lgorithm by Kleinberg (99) nd the Pge rnk lgorithm by Pge et l. (99) to identify the thorittive sers from the grph. The HITS lgorithm gives two scores per node: Athority score nd the hb score. Both these score fit well into the inflencer - inflenced prdigm. The thorittive score give n indiction of the inflencing power of ser nd the hb score tries to mesre ssceptibility of ser to peer inflence. The ser rnk score is similr to the pge rnk score, which gives the thority of the ser in the ction grph. The ction level fetres n sers, n prone sers nd indicte the ction poplrity. Besides this, we lso inclde Flickr specific fetre topic cnt. Ech grop in Flickr mention vrios topics relted to the grop. We consider the nmber of topics in ech grop s fetre. The socil neighborhood fetres try to cptre the cpbility of ser s neighborhood to extend the cscdes triggered by. In ddition to the verge inflence nd prone probbilities, we lso compte the nmber of friends of t ech hop level. Reslts In this section, we present the combined effect of vrios fctors on the rech () vle. Tble shows the performnce of both the bselines nd the mchine lerned model. In Tble, Improvement & show improvements over bseline & respectively. All the reslts presented re sttisticlly significnt t 99% significnce level. We sed pired t-test for testing sttisticl significnce. As shown, the prediction model gives good improvement of.% over bseline- nd n improvement of.% over bseline-. Besides, the model does better
Set Fetre Description n friends Nmber of friends of the ser n ctions Nmber of ctions performed by gender M - mle, F - Femle, X - Unknown ser rnk Rnk of the ser (similr to pge rnk) User-level th Athority score of the ser in the ction grph sing HITS lgorithm Fetres hb Hb score of the ser in the ction grph sing HITS lgorithm. n j Nmber of ctions propgted Inflence probbility of the ser Prone probbility of the ser vginf() Averge inflence cross ll sers nd grops n hop k Nmber of sers t hop level k=(,,) Neighborhood Fetres :hopk Averge of inflence probbilities of ll friends t hop k from the sers Averge of prone probbilities of ll friends t hop k from the sers Action-level Fetres :hopk n ser n prone sers topic cnt System MSE KL Div.- Improv- Improv- (*.) ement ement Bseline-.. - - Bseline-...9 % - Model...%. % Tble : Improvements in the model compred to bselines Rnk Fetre Ctegory Importnce n sers ction ction. ser. n prone sers ction. :hop Neighborhood. log(n hop ) Neighborhood. log(n friends) ser. :hop Neighborhood. 9 :hop Neighborhood. ser 9. log(n hop ) Neighborhood.9 log(hb) ser. Tble : Fetre importnce: Top fetres thn bseline- by.% in terms of improvements over bseline-. Also, bseline- performs sbstntilly better compred to bseline-. On frther investigtion of the dt, it ws fond tht the verge coefficient of vrition for the ctions ws.9, while for sers the verge ws.. Hence lesser vrince in the rech vles mongst the ction, reslts in bseline- performing better thn bseline-. In ddition to the overll performnce of the model, It is lso interesting to ssess the contribtion of ech fetre in the lerned model. Tble shows the fetre importnce for the top fetres. The fetre contribtions re scled with respect to the top performing fetre n sers. As shown in Tble, the top few performing fetres Nmber of ser who hve performed ction Nmber of sers doing ction nder peer inflence Inflence probbility of ction. Nmber of topics in the grop(specific to Flickr dtset) Tble : Fetre set τ System MSE KL Div. Improv- Improv- (*.) ement ement Bseline-.. - - For Bseline-..9.% - dys Model...9%.% Six dys Two weeks Bseline-.. - - Bseline-.9. 9.% - Model...%.9% Bseline-.. - - Bseline-...% - Model...%.% Tble : Improvements in the model compred to bselines for vrios τ vles come from the ction-level ctegory showing helthy contribtion in the overll prediction. Which mens tht more contgios the ction better is the rech of tht ction for ser. Followed by the top few ction level fetres, there re vrios ser nd socil neighborhood level fetres showing decent contribtions. The reslts presented in Tble were for τ = week. Next, we lso vry the τ vle to see if chnging the vle ffects ny of the improvements obtined in Tble. Tble shows the performnce of the model for vrios τ vles. It shold be noted tht chnging the τ vle chnges the ction grphs nd hence the inflence, prone probbilities. Most of fetres re recompted for every different vle of τ. As shown in tble the improvements over both the bselines re consistent cross vrios τ vles. Discssion We hve nlyzed vrios fctors contribting to the cscdes triggered by ser. The nlysis yields severl interesting insights - There is direct ssocition between the rech nd vrios ser, ction nd neighborhood fctors. The nlysis confirms tht more contgios these fctors re bigger is the rech for tht ser nd the ction. While there is n evidence of socil inflence, the ction itself crries lrge mont of predictive power gmented
by ser nd the neighborhood s inflencing bilities. Anlysis of fetre contribtion nd the performnce of bseline- complement the clims of ction being the dominnt fctor in the prediction of the spred of the ction. As mentioned, less vrince in rech vles cross the ctions s compred to the sers reslts in ction plying more importnt role. Bkshy et l. () did similr work of predicting the verge size of the cscdes for ser. Interestingly, they fond ot tht the content itself crries little predictive power in determining the length of cscdes. However, there re few sbtle differences - They focs on evlting vrios trgeting strtegies to mximize spred of inflence, while we focs on nlyzing the contribtion of the socil network nd ction on the cscdes.while they consider the content itself for predicting the cscdes, we look t the poplrity of the content s fetre. Also, s the governing socil dynmics is different for both the networks, the cscdes re driven by different diffsion mechnisms. In this pper, the model lerns the prediction from the pst events of the ser nd ction. In cses, where we need to identify the set of inflencers for new ctions for which we do not hve pst informtion, the fetres cn be inherited from similr ctions hving pst informtion. The notion of similrity lrgely depends pon the context. In or cse, For new grop, similrity cn be bsed on the topics tht re discssed in the grops. Other intitive exmple, where new ctions re prevlent is the diffsion of d s inflence in socil network where clicking on the d or bying prticlr prodct being dvertised cn be considered s n ction. In sch cses ds from the sme dvertiser, or for the sme prodct cn be sed s mesre. In scenrios where the notion of similrity between ctions cn not be defined, the tsk of identifying the inflencers hs to rely on the ser nd the neighborhood fetres. Distingishing homophily nd inflence is togh problem in generl. In this pper, we vil temporl difference between the ction to distingish homophily nd inflence. Most of reserch tht involves distingishing homophily nd inflence is either t the network level or is difficlt to implement on lrge online networks. There is cler need for more robst nd sclble techniqe to distingish these two types of diffsions t the ction propgtion grnlrity. Conclsion In this pper, we nlyzed the correltion between sers, ction nd their rech. Anlysis showed tht there is positive correltion between the rech nd vrios ser-level, ctionlevel nd neighborhood-level fctors. When these fctors were considered together the combined effort increses the rech vle frther. Bsed on this nlysis, we bilt mchine lerning model to predict the verge rech for serction pir. We empiriclly showed tht the ction, ser nd the neighborhood fetres combined together give good prediction of the verge rech of ser in the grph. While fetres pertining to ction ply dominnt importnt role in the prediction, they re ptly spported by the ser nd neighborhood fetres. The model performs better thn severl bselines systems. We sed socil grphs generted from Flickr for or experiments. It will be interesting to repet the experiments on other socil online grphs sch s Twitter, Fcebook or n IM network to see if they show similr trends. We consider this s ftre work. References Angnostopolos, A.; Kmr, R.; nd Mhdin, M.. Inflence nd correltion in socil networks. In KDD,. ACM. Arl, S., nd Wlker, D.. Creting socil contgion throgh virl prodct design: A rndomized tril of peer inflence in networks. Arl, S.; Mchnik, L.; nd Sndrrjn, A. 9. Distingishing inflence-bsed contgion from homophily-driven diffsion in dynmic networks. Proceedings of the Ntionl Acdemy of Sciences (): 9. Bkshy, E.; Hofmn, J. M.; Mson, W. A.; nd Wtts, D. J.. Everyone s n inflencer: qntifying inflence on twitter. WSDM,. New York, NY, USA: ACM. Bhtt, R.; Choji, V.; nd Prekh, R.. Predicting prodct doption in lrge-scle socil networks. In CIKM. Ch, M.; Mislove, A.; Adms, B.; nd Gmmdi, K. P.. Chrcterizing socil cscdes in flickr. In WOSP : Proceedings of the first workshop on Online socil networks,. ACM. Chen, W.; Wng, Y.; nd Yng, S. Efficient inflence mximiztion in socil networks. In KDD 9, 99. ACM. Domingos, P., nd Richrdson, M.. Mining the network vle of cstomers. In KDD,. ACM. Goyl, A.; Bonchi, F.; nd Lkshmnn, L. V.. Lerning inflence probbilities in socil networks. In WSDM,. ACM. Hrtline, J.; Mirrokni, V.; nd Sndrrjn, M.. Optiml mrketing strtegies over socil networks. In WWW, 9 9. ACM. Kempe, D.; Kleinberg, J.; nd Trdos, E.. Mximizing the spred of inflence throgh socil network. In KDD,. ACM. Kleinberg, J. M. 99. Athorittive sorces in hyperlinked environment. In SODA 9: Proceedings of the ninth nnl ACM-SIAM symposim on Discrete lgorithms,. Phildelphi, PA, USA: Society for Indstril nd Applied Mthemtics. L Fond, T., nd Neville, J.. Rndomiztion tests for distingishing socil inflence nd homophily effects. In WWW,. ACM. Leskovec, J.; Krse, A.; Gestrin, C.; Flotsos, C.; Vn- Briesen, J.; nd Glnce, N.. Cost-effective otbrek detection in networks. In KDD, 9. ACM. Pge, L.; Brin, S.; Motwni, R.; nd Winogrd, T. 99. The PgeRnk Cittion Rnking: Bringing Order to the Web. Richrdson, M., nd Domingos, P.. Mining knowledge-shring sites for virl mrketing. In KDD,. ACM.
Sito, K.; Nkno, R.; nd Kimr, M.. Prediction of informtion diffsion probbilities for independent cscde model. In KES,. Berlin: Springer-Verlg. Singl, P., nd Richrdson, M.. Yes, there is correltion: - from socil networks to personl behvior on the web. In WWW,. ACM. Tng, J.; Sn, J.; nd Wng, C. e.. Socil inflence nlysis in lrge-scle networks. In KDD 9,. ACM. Wtts, D. J., nd Dodds, P. S.. Inflentils, networks, nd pblic opinion formtion.