Who Will Follow You Back? Reciprocal Relationship Prediction


 Julianna Chapman
 1 years ago
 Views:
Transcription
1 Who Will Follow You Back? Reciprocal Relationhip Prediction John Hopcroft Department of Computer Science Cornell Univerity Ithaca NY 4853 Tiancheng Lou Intitute for Interdiciplinary Information Science Tinghua Univerity Beijing 84, China Jie Tang Department of Computer Science Tinghua Univerity Beijing 84, China ABSTRACT We tudy the extent to which the formation of a twoway relationhip can be predicted in a dynamic ocial network. A twoway (called reciprocal) relationhip, uually developed from a oneway (paraocial) relationhip, repreent a more trutful relationhip between people. Undertanding the formation of twoway relationhip can provide u inight into the microlevel dynamic of the ocial network, uch a what i the underlying community tructure and how uer influence each other. Employing Twitter a a ource for our experimental data, we propoe a learning framework to formulate the problem of reciprocal relationhip prediction into a graphical model. The framework incorporate ocial theorie into a machine learning model. We demontrate that it i poible to accurately infer 9% of reciprocal relationhip in a dynamic network. Our tudy provide trong evidence of the exitence of the tructural balance among reciprocal relationhip. In addition, we have ome intereting finding, e.g., the likelihood of two elite uer creating a reciprocal relationhip i nearly 8 time higher than the likelihood of two ordinary uer. More importantly, our finding have potential implication uch a how ocial tructure can be inferred from individual behavior. Categorie and Subject Decriptor H.2.8 [Databae Management]: Data Mining; J.4 [Social and Behavioral Science]: Micellaneou; H.4.m [Information Sytem]: Micellaneou General Term Algorithm, Experimentation Keyword ocial network, reciprocal relationhip, ocial influence, predictive model, link prediction, Twitter Author are in alphabetic order. The work were done when the lat two author were viiting Cornell Univerity. Permiion to make digital or hard copie of all or part of thi work for peronal or claroom ue i granted without fee provided that copie are not made or ditributed for profit or commercial advantage and that copie bear thi notice and the full citation on the firt page. To copy otherwie, to republih, to pot on erver or to reditribute to lit, require prior pecific permiion and/or a fee. Copyright 2X ACM XXXXXXXXX/XX/XX...$5... INTRODUCTION Online ocial network (e.g., Twitter, Facebook, Mypace) ignificantly enlarge our ocial circle. One can follow any elite (celebritie), e.g., politician, model, actor, and athlete, or cloe in her phyical ocial network. An intereting quetion here i: when you follow a number of uer, who will follow you back? A more pecific quetion i: if you follow thoe celebritie (elite uer) on Twitter, do you think they will follow you back? The anwer i often No, but alo Ye ometime. There are a number of top uer with ten of thouand of follower, who will follow everyone back. Some even ue tool to do followback automatically, while other go through the following lit and add their new follower manually. Awarene of how thee relationhip are created can benefit many application uch a uggetion, community detection, and wordofmouth product promotion. In ocial cience, relationhip between individual are claified into two categorie: oneway (called paraocial) relationhip and twoway (called reciprocal) relationhip [9]. The mot common form of the former are oneway relationhip between celebritie and audience or fan, while the mot common form of the latter are twoway relationhip between cloe. Twitter and Facebook are repectively typical example of the two type of ocial relationhip. Social relationhip form the bai of the ocial tructure. Indeed, ocial relationhip are alway the baic object of analyi for ocial cientit, for intance, in Max Weber theory of ocial action [29]. Undertanding the formation of ocial relationhip can give u inight into the microlevel dynamic of the ocial network, uch a how an individual uer influence her/hi through different type of ocial relationhip [26], and how the underlying ocial tructure change with the dynamic of relationhip formation [23]. Employing Twitter a the bai of our analyi, we tudy how a twoway (reciprocal) relationhip ha been developed from a oneway (paraocial) relationhip. Specifically, we try to anwer: when you follow a particular uer (either an elite uer or an ordinary uer), how likely will he/he follow you back?. Thi problem alo implicitly exit in other ocial network uch a Facebook and LinkedIn: when you end a requet to omebody, how likely will he/he confirm your requet? Previou reearch on ocial relationhip can be claified into three categorie: link prediction [2, 6, 7, 23], relationhip type inferring [4, 5, 27], and ocial behavior prediction [, 25, 33]. Backtrom and Lekovec [2] propoed an approach called upervied random walk to predict and recommend link in ocial network. Crandall et al. [4] invetigated the problem of inferring ocial tie between people from cooccurrence in time and pace.
2 Wang et al. [27] propoed an unupervied algorithm to infer advioradviee relationhip from a publication network. However, little reearch ytematically tudie how twoway relationhip can be developed from oneway relationhip. More fundamentally, what are the underlying factor that eentially influence the formation of twoway relationhip? and how exiting ocial theorie (e.g., tructural balance theory and homophily) can be connected to the formation proce? In thi paper, we try to conduct a ytematic invetigation on the problem of twoway (reciprocal) relationhip prediction. We preciely define the problem and propoe a Triad Factor Graph (TriFG) model. The TriFG model incorporate ocial theorie into a emiupervied learning model, where we have ome labeled training data (twoway relationhip) but with low reciprocity [3]. Given a hitoric log of uer following action from timetot, we try to learn a predictive model to infer whether uer A will add a followback link to uerb at time(t+) if uerb create a new follow link to uer A at time t. We evaluate the propoed model on a Twitter data coniting of 3,442,659 uer and their profile, tweet, following behavior (new following or followback link) for nearly two month. Reult We how that incorporating ocial theorie into the propoed factor graph model can ignificantly improve the performance (+22%+27% by FMeaure) of twoway (reciprocal) relationhip prediction compared with everal alternative method. Our tudy alo reveal everal intereting phenomena:. Elite uer tend to follow each other. The likelihood of an elite uer following back another elite uer i nearly 8 time higher than that of two ordinary uer and 3 time that of an elite uer and an ordinary uer. 2. Twoway relationhip on Twitter are balanced, but oneway relationhip are not. More than 88% of ocial triad (group of three people) with twoway relationhip atify the ocial balance theory, while oneway relationhip are unbalanced (merely 25% of them atify the balance theory). 3. Social network are going global, but alo tay local. No matter how far a uer i from you, the likelihood that he/he follow you back i almot the ame. While, on the other hand, the number of twoway relationhip between uer within the ame time zone i 2 time higher than the number of uer from different time zone. Organization Section 2 formulate the problem. Section 3 introduce the data et and our analye on the data et. Section 4 explain the propoed model and decribe the algorithm for learning the model. Section 5 preent experimental reult that validate the effectivene of our methodology. Finally, Section 6 review the related work and Section 7 conclude thi work. 2. PROBLEM DEFINITION In thi ection, after preenting everal definition, we formally define the targeted problem in thi work. We formulate the problem in the context of Twitter to keep thing concrete, though adaptation of thi framework to other ocialnetwork etting i traightforward. The Twitter network can be modeled a a directed graph G = {V,E}, where V = {v,v 2,...,v n} i the et of uer, and E V V i the et of directed link between uer. Each directed link e ij = (v i,v j) E indicate that uerv i follow uer v j. The Twitter network i dynamic in nature, with link added and removed from over time. However, our preliminary tatitic on a large Twitter data et how that uer tend to add new link much more frequently than to remove exiting link (e.g., 97% of change to link are adding new link). Therefore, adding new link form the tructure of the Twitter network. A new link reult when a uer perform a behavior of following another uer (back) in Twitter. Particularly, we define two type of the link behavior: Definition. Newfollow and Followback: Suppoe at time t, uerv i create a link tov j, who ha no previou link tov i, then we ayv i perform a newfollow behavior onv j. When uerv i create a link tov j at timet, who already ha a link tov i before timet, we ayv i perform a followback behavior on v j. The newfollow and followback behavior repectively correpond to the oneway (paraocial) relationhip and the twoway (reciprocal) relationhip in ociology. In thi work, we focu on invetigating the formation of followback behavior. For implicity, let y t ij = denote that uer v i follow back v j at time t and y t ij = denote uer v i doe not follow back. We are concerned with the following prediction problem: Problem. Followback prediction. Let <,...,t > be a equence of time tamp with a particular time granularity (e.g., day, week etc.). Given Twitter network from time to t, {G t = (V t,e t,y t )}, wherey t i the et of followback behavior at time t, the tak i to find a predictive function: f : ({G,,G t }) Y (t+), uch that we can infer the followback behavior at time(t+). It bear pointing out that our problem i very different from exiting link prediction [2, 7, 23] and ocial action prediction problem [25, 33]. Firt, a the twitter network i evolving over time, it i infeaible to collect a complete network at timet. Thu it i important to deign a method that could take into conideration the unlabeled data a well. Second, it i unclear what are the fundamental factor that caue the formation of followback relationhip. Finally, one need to incorporate the different factor (e.g., ocial theorie, tatitic, and our intuition) into a unified model to better predict the followback relationhip. 3. DATA AND OBSERVATIONS 3. Data Collection We aim to find a large et of uer and a continuouly updated network among thee uer, o that we can ue the data et a the goldtandard to evaluate different approache for our prediction. To begin the collection proce, we elected the mot popular uer on Twitter, i.e., Lady Gaga, and randomly collected, of her follower. We took thee uer a eed uer and ued a crawler to collect all follower of thee uer by travering following edge. We continue the travering proce, which produced in total 3,442,659 uer and 56,893,234 following link, with an average of 728,59 new link per day. The crawler monitored the change of the network tructure from /2/2 to 2/23/2. We alo extracted all tweet poted by thee uer and in total there are 35,746,366 tweet. In our analyi, we alo conider the geographic location of each uer. Specifically, we firt extracted the location from the profile of each uer 2, and then fed the location information to the Google Map API to fetch it correponding longitude and latitude value. In thi 2 For example, Lady Gaga location information i: Location: New York, NY.
3 follow back probability avg time zone difference (a) Global #follow back average time zone difference (b) Local Figure : Geographic ditance correlation. Xaxi: time zone difference ( indicate that uer are located in the ame time zone); Yaxi: (a) probability that one uer follow back another uer, conditioned on the time zone difference of the two uer. (b) number of twoway relationhip among uer from the ame time zone or different time zone. way, we obtained the longitude and latitude of about 59% of uer in our data et. More detailed analyi and an online demontration i publicly available Obervation We firt engage in ome highlevel invetigation of how different factor influence the formation of followback (reciprocal) relationhip, ince a major motivation of our work i to find the underlying factor and their influence to thi tak. In particular, we tudy the interplay of the following factor with the formation of followback: () Geographic ditance: Do uer have a higher probability to follow each other when they are located in the ame region? (2) Homophily: Do imilar uer tend to follow each other? (3) Implicit network: How doe the following network on Twitter correlate with other implicit network, e.g., retweet and reply network? and (4) Social balance: Doe the twoway relationhip network on Twitter atify the ocial balance theory [6]? To which extent? Geographic ditance Figure how the correlation between geographic ditance and the probability that two uer create a twoway relationhip (i.e., follow back each other). Interetingly, it eem that online ocial network indeed go global: Figure (b) how the likelihood of a uer following another uer back when they are from the ame time zone or from different time zone. Clearly, the geographic ditance i already not a factor to top uer from developing a trutful (reciprocal) relationhip. Figure (a) how another tatitic which indicate a different perpective that the Twitter network (in ome ene) till tay local: the average number of twoway (reciprocal) relationhip between uer from the ame time zone i about 5 time higher than the number between uer with a ditance of three time zone. Homophily The principle of homophily [5] ugget that uer with imilar characteritic (e.g., ocial tatu, age) tend to aociate with each other. In particular, we tudy two kind of homophilie on the Twitter network: link homophily and tatu homophily. For the link homophily, we tet whether uer who hare common link (follower or followee) will have a tendency to aociate with each other. Figure 2 clearly how that the probability of two uer following back each other when they hare common neighbor i much higher than uual. When the number of common neighbor with two way relationhip increae to 3, the likelihood of two uer following back each other alo triple. The effect i more pronounced when the number increae to. But it i worth noting that thi only work for twoway (reciprocal) relationhip follow back probability one way two way #common neighbor Figure 2: Link homophily. Yaxi: probability that two uer follow back each other, conditioned on the number of common neighbor of twoway relationhip (or oneway relationhip). and doe not hold for the oneway (paraocial) relationhip (a indicated in Figure 2). For the tatu homophily, we tet whether two uer with imilar ocial tatu are more likely to aociate with each other. We categorize uer into two group (elite uer and ordinary uer) by three different algorithm: PageRank [22] 3, #degree, and(α,β) algorithm [8] 4. Specifically, with PageRank, we etimate the importance of each uer according to the network tructure, and then elect a elite uer with the top % uer 5 who have the highet PageRank core and the ret a ordinary uer; while with #degree, we elect top % uer with the highet number of indegree a elite uer and the ret a ordinary uer. For(α,β), we input the ize of the core community a 2, and after running the algorithm, we ue uer elected in the core community a elite uer and the ret a ordinary uer. Then, we examine the difference of follow back behavior among the two group of uer. Figure 3 clearly how that, though the three algorithm preent different tatitic, elite uer have a much tronger tendency to follow each other: the likelihood of two elite uer following back each other i nearly 8 time higher than that of ordinary uer (by the(α,β) algorithm). The (α, β) algorithm eem able to better ditinguih elite uer from ordinary uer in our problem etting. Thi i becaue beide the global network tructure, the (α, β) algorithm alo conider the community tructure among elite uer. Implicit tructure On Twitter, beide the explicit network with following link, there are alo ome implicit network tructure that can be induced from the textural information. For example, uer A may mention uerb in her tweet, which i called a reply link; ueramay forward uerb tweet, which reult in a retweet link. We tudy how the implicit link correlate with the formation of the followback relationhip on Twitter. Figure 4 clearly how that when uer A and B retweet or reply each other tweet, the likelihood of their following back each other i higher (3 time than chance). Another intereting phenomenon i that compared with replying omeone tweet, retweeting (forwarding) her tweet eem to be more helpful (5% v. 9%) to win her followback. Structural balance Now, we connect our work to a baic ocial pychological theory: tructural balance theory [6]. Let u firt explain the tructural balance property. For every group of three uer 3 PageRank i an algorithm to etimate the importance of each node in a network. 4 (α,β) algorithm i deigned to find core member (elite uer) in a ocial network. 5 Statitic have hown that le than % of the Twitter uer produce 5% of it content [3].
4 follow back probability ordinary uer ordinary and elite uer elite uer #degree pagerank (alpha, beta) Figure 3: Statu homophily by different algorithm. Yaxi: probability that two uer follow back each other, conditioned on whether the two uer are from the ame group of elite/ordinary uer or from different group. #Degree, PageRank, and (α, β) are three algorithm to ditinguih elite uer from ordinary uer. follow back probability retweet no retweet(reply) A retweet(reply) B B retweet(reply) A both reply Figure 4: Implicit network correlation. Yaxi: probability that uerb follow uer A back, conditioned on one uer (A orb) retweet or replie the other uer tweet. (called triad), the balance property implie that either all three of thee uer are or only one pair of them are. Figure 5 how uch an example. To adapt the theory to our problem, we can map either the twoway relationhip or the oneway relationhip on the hip. Then we examine how the Twitter network with (only twoway relationhip or oneway relationhip) atify the tructural balance property. More preciely, we compare the probabilitie of the reultant triad that atify the balance theory baed on twoway relationhip and oneway relationhip on Twitter. Figure 6 clearly how that it i much more likely (88%) for uer to be connected with a balanced tructure of twoway relationhip. While with oneway relationhip, the reultant tructure i very unbalanced. Thi i becaue two uer are very likely to follow a ame movie tar, but they do not know each other, which reult in a unbalanced triad (Figure 5 (C)). In ummary, according to the tatitic above, we have the following obervation:. Geographic ditance ha a pronounced effect on the number of twoway relationhip created between uer, but little effect on the likelihood of uer following back each other. 2. Uer with common of twoway relationhip have a tendency (link homophily) to follow each other. 3. Elite uer have a much tronger tendency (tatu homophily) to follow each other than ordinary uer. 4. The implicit network of retweet or reply link have a trong correlation with the formation of twoway (reciprocal) relationhip. B A C B A C B A C B A (A) (B) (C) (D) Figure 5: Illutration of tructural balance theory. (A) and (B) are balanced, while (C) and (D) are not balanced. probability two way not balance balance one way Figure 6: Structural balance correlation. Yaxi: probability that a triad create twoway (reciprocal) relationhip, conditioned on whether the reultant tructure i balanced or not. 5. The network of twoway relationhip on Twitter i balanced (88% of triad atifying the tructural balance property), while the network of oneway relationhip i unbalanced (7% are unbalanced). 4. MODEL FRAMEWORK In thi ection, we propoe a novel Triad Factor Graph (TriFG) model to incorporate all the information within a ingle entity for better modeling and predicting the formation of twoway (followback) relationhip. For an edge e ij E, if uer v j follow v i at time t, our tak i to predict whether uer v i will follow v j back, i.e. y ij = or. For eay explanation, We introduce a light change of notation. We write each edge a e i with it two end uer a v i and v u i. For the followback prediction tak, we aume that v i follow v u i at time t, and our tak i to predict whether v u i will follow v i back at time (t + ). Baed on the obervation in 3, we define a number of attribute for each edge, denoted a x i. The E d attribute matrix X decribe edgepecific characteritic, wheredi the number of attribute. For example, on Twitter, an attribute can be defined a whether two end uer are from the ame time zone. An element x ij in the matrix X indicate the j th attribute value of edge e i. 4. The Propoed Model We propoe a Triad Factor Graph (TriFG) model. The name i derived from the idea that we incorporate ocial theorie (tructural balance and homophily) over triad into the factor graph model. Figure 7 how the graphical tructure of the TriFG model. The left figure how the following network of ix uer at time t. Blue arrow indicate new follow action, black arrow indicate follow action performed before timet, and blue indicate uerv u i doe not follow uerv i back at timet. The right figure i the factor graph model derived from the left input network. Each gray eclipe indicate an relationhip (v u i,v i) between uer and each white circle indicate the hidden variable y i, with y i = repreenting v u i perform a followback action,y i = not, andy i =? unknown, which actually i the variable we need to predict. Factor h(.) repreent C
5 New follow action (in blue) at time t 3 5 v v 6 4 v v 2 v 5 TriFG model v y = f (v u,v,y) v u,v y 2=? y 2 y y 3 h (y,y 2,y 3) y 3= (v2,v) f (v2 u,v2,y2) v 2 u,v 2 (v2,v3) f (v3 u,v3,y3) v 3 u,v 3 (v4,v3) y 4 v 4 u,v 4 Obervation y 4=? f (v4 u,v4,y4) (v4,v5) h (y 3,y 4,y 5) y 5 y 6 y 6=? f (v6 u,v6,y6) v 6 u,v 6 (v6,v5) y 5= f (v5 u,v5,y5) v 5 u,v 5 (v4,v6) Figure 7: Graphical repreentation of the TriFG model. The left figure how the follow network at timet. Blue arrow indicate new follow action, black arrow indicate previouly exiting follow link, and blue indicate uervi u doe not follow uerv i back. The right figure i the TriFG model derived from the following graph. Each gray eclipe indicate an relationhip (vi u,v i ) between uer and each white circle indicate the hidden variable y i. f(vi,vu i,y i) repreent an attribute factor function and h(.) repreent a triad factor function. a balance factor function defined on a triad; and f(v i,v u i,y i) (or f(x i,y i)) repreent a factor to capture the information aociated with edge e i. Given a network at time t, i.e., G t = (V t,e t,x t ) with ome known variabley = or and ome unknown variabley =?, our goal i to infer value of thoe unknown variable. For implicity, we remove the upercripttfor all variable if there i no ambiguity. We begin with the poterior probability of P(Y X,G), according to the Baye theorem, we have P(Y X,G) = P(X,G Y)P(Y) P(X,G) P(X Y) P(Y G) () where P(Y G) denote the probability of label given the tructure of the network and P(X Y) denote the probability of generating the attributexaociated with each edge given their labely. Auming that the generative probability of attribute given the label of each edge i conditionally independent, we get P(Y X,G) P(Y G) i P(x i y i) (2) where P(x i y i) i the probability of generating attribute x i given the labely i. Now, the problem i how to intantiate the probabilitie P(Y G) and P(x i y i). In principle, they can be intantiated in different way. In thi work, we model them in a Markov random field, and thu by the HammerleyClifford theorem [7], the two probabilitie can be intantiated a: P(x i y i) = Z exp{ d α jf j(x ij,y i)} (3) j= P(Y G) = Z 2 exp{ c µ k h k (Y c)} (4) where Z and Z 2 are normalization factor. Eq. 3 indicate that we define a feature functionf j(x ij,y i) for each attributex ij aociated with edge e i and α j i the weight of the j th attribute; while Eq. 4 repreent that we define a et of correlation feature function k Input: networkg t, learning rate η Output: etimated parameter θ Initialize θ ; repeat Perform LBP to calculate marginal ditribution of unknown variable P(y i x i,g); Perform LBP to calculate the marginal ditribution of triadc, i.e., P(y c X c,g); Calculate the gradient ofµ k according to Eq. 7 (forα j with a imilar formula): O(θ) µ k = E[h k (Y c)] E Pµk (Y c X,G)[h k (Y c)] Update parameter θ with the learning rateη: until Convergence; θ new = θ old +η O(θ) θ Algorithm : Learning algorithm for the TriFG model. {h k (Y c)} k over each triady c in the network. Hereµ k i the weight of the k th correlation feature function. Baed on Eq. 24, we define the following loglikelihood objective function O(θ) = logp θ (Y X,G): O(θ) = E i= d j=α jf j(x ij,y i)+ c µ k h k (Y c) logz (5) where Y c i a triad derived from the input network, Z = Z Z 2 i a normalization factor and θ = ({α},{µ}) indicate a parameter configuration. One example of factor decompoition i hown in Figure 7. There are ix edge, three with known variable (two y = and one y = ) and three with unknown value (y =?). We have four triad (e.g., Y c = (y,y 2,y 3)) baed on the tructure of the input network. For each edge, we define a et of factor function f(v i,v u i,y i) (alo written a f(x i,y i)). We now briefly introduce poible way to define the factor functionf j(x ij,y i) andh k (Y c). f j(x ij,y i) i an attribute factor function. It can be defined a either a binary function or a realvalued function. For example, for the implicit network feature, we imply define it a a binary feature, that i if uerv i forwarded (retweeted) v u i tweet before time t and uer v u i follow uer v i back, then a feature f j(x ij =,y i = ) i defined and it value i ; otherwie. (Such a feature definition i often ued in graphical model uch a Conditional Random Field [4]. For the triad factor function h(y c), we define four feature, two balanced and two unbalanced factor function, a depicted in Figure 5. The triad function i defined a a binary function, that i, if a triad atifie the tructural balance property, then the value of a correponding triad factor function i, otherwie. More detail of the factor function definition are given in Appendix. 4.2 Model Learning and Prediction We now addre the problem of etimating the free parameter and inferring uer followback behavior. Learning the TriFG model i to etimate a parameter configuration θ = ({α},{µ}) to maximize the loglikelihood objective function O(θ) = logp θ (Y X,G), i.e., k θ = arg maxo(θ) (6)
6 To olve the objective function, we adopt a gradient decent method (or a NewtonRaphon method). We ue µ a the example to explain how we learn the parameter. Specifically, we firt write the gradient of each µ k with regard to the objective function (Eq. 5): O(θ) µ k = E[h k (Y c)] E Pµk (Y c X,G)[h k (Y c)] (7) wheree[h k (Y c)] i the expectation of factor functionh k (Y c) given the data ditribution (eentially it can be conidered a the average value of the factor function h k (Y c) over all triad in the training data); and E Pµk (Y c X,G)[h k (Y c)] i the expectation of factor function h k (Y c) under the ditribution P µk (Y c X,G) given by the etimated model. A imilar gradient can be derived for parameter α j. One challenge here i that the graphical tructure in the TriFG model can be arbitrary and may contain cycle, which make it intractable to directly calculate the marginal ditribution P µk (Y c X,G). A number of approximate algorithm can be conidered, uch a Loopy Belief Propagation (LBP) [2] and Meanfield [32]. We choe Loopy Belief Propagation due to it eae of implementation and effectivene. Specifically, we approximate the marginal ditribution P µk (Y c X,G) uing LBP. With the marginal probabilitie, the gradient can be obtained by umming over all triad. It i worth noting that we need to perform the LBP proce twice in each iteration, one time for etimating the marginal ditribution of unknown variabley i =? and the other time for marginal ditribution over all triad. Finally with the gradient, we update each parameter with a learning rate η. The learning algorithm i ummarized in Algorithm. Predicting Followback With the etimated parameter θ, we can predict the label of unknown variable {y i =?} by finding a label configuration which maximize the objective function, i.e., Y = argmaxo(y X,G,θ). It i till intractable to obtain the exact olution. Again, we utilize the loopy belief propagation to approximate the olution, i.e., to calculate the marginal ditribution of each relationhip with unknown variablep(y i x i,g) and finally aign each relationhip with label of the maximal probability. 5. EXPERIMENTS In thi ection, we firt decribe our experimental etup. We then preent the performance reult for different approache in different etting. Next, we preent everal analye and dicuion. Finally, we ue a cae tudy further to demontrate the advantage of the propoed model. 5. Experimental Setup Prediction Setting We ue the data et decribed in 3 in our experiment. To quantitatively evaluate the effectivene of the propoed model and compare with other alternative method, we carefully elect a ub network from the data et, which ha a completely hitoric log of link formation among all uer, i.e., each uer i aociated with a complete lit of follower and uer they are following at each time tamp. The ub network i compried of 2,44 uer, 468,238 following link among them, and 2,49,768 tweet. Averagely there are 3,337 new followback link per day. We divide the ub network into 3 time tamp by viewing every four day a a time tamp. Our general tak i to predict whether a uer will follow another uer back at the next time tamp when he received a new following link from the other uer. By a more careful tudy however, we follow back probability time tamp Figure 8: Followback probability for different time tamp. found that it i very challenging if we retrict the prediction jut for the next time tamp. Figure 8 how the ditribution of time pan in which a uer perform the followback action, which indicate that 6% of followback are performed in the next time tamp though, 37% of the followback would be till performed in the following three time tamp. A further data analyi, how that active uer often either perform an immediate followback (at the next time tamp) or reject to followback; while ome other (inactive) uer may not frequently login into Twitter, thu the time pan of followback varie a lot. According to thi obervation, in our firt experiment, we ue a network of the firt 8 time tamp for training and predicate followback action in the following 4 (9th 2th) time tamp (Tet Cae ). Then we incrementally add the network of the 9th time tamp into the training data and again ue the following 4 (th3th) time tamp for prediction (Tet Cae 2). We repectively report the prediction performance of different approache for the two tet cae. Comparion Method We compare the propoed TriFG model with the following method: SVM: it ue the ame attribute aociated with each edge a feature to train a claification model and then employ the claification model to predict edge label in the tet data. For SVM, we employ SVMlight. LRC: it ue the ame attribute aociated with each edge a feature to train to train a logitic regreion claification model [6] and then predict edge label in the tet data. CRFbalance: it train a Conditional Random Field [4] model with attribute aociated with each edge. The difference of thi method from our model i that it doe not conider tructural balance factor. CRF: it train a Conditional Random Field model all factor (including attribute and tructural balance factor) and predict edge label in the tet data. TriFG: the propoed model, which train a factor graph model with unlabeled data and all factor we defined in 4. Weak TriFG (wtrifg): the difference of wtrifg from TriFG i that we do not conider tatu homophily and tructural balance here. We ue thi method to evaluate how ocial theorie can help thi tak. In the five method, SVM and CRFbalance only conider attribute factor; wtrifg further conider unlabeled data. CRF conider all factor we defined, but doe not conider unlabeled data. Our propoed TriFG model conider all factor a well a the unlabeled data. Evaluation Meaure We evaluate the performance of different approache in term of Preciion (Prec.), Recall (Rec.), FMeaure (F), and Accuracy (Accu.).
7 Table : Followback prediction performance of different method in the two tet cae. Tet Cae : predicting followback action in the 9th2th time tamp; and Tet Cae 2 for the th3th time tamp. Data Algorithm Prec. Rec. F Accu. Tet Cae Tet Cae 2 SVM LRC CRFbalance CRF wtrifg TriFG SVM LRC CRFbalance CRF wtrifg TriFG Table 2: Followback prediction performance of TriFG with three different algorithm (#degree, PageRank and (α, β)) for finding elite uer from ordinary uer. Data Algorithm Prec. Rec. F Accu. Tet Cae Tet Cae 2 (α, β) #degree pagerank (α, β) #degree pagerank All algorithm are implemented in C++, and all experiment are performed on a PC running Window 7 with Intel(R) Core(TM) 2 CPU 66 (2.4GHz and 2.39GHz) and 4GB memory. All algorithm have a good efficiency performance: the CPU time needed for training and prediction by all method on the Twitter network range from 2 to 5 minute. 5.2 Prediction Performance We now decribe the performance reult for the different method we conidered. Table how the reult in the two tet cae (prediction performance for the 9th2th time tamp and that for the th3th time tamp). It can be clearly een that our propoed TriFG model ignificantly outperform the four comparion method. In term of FMeaure, TriFG achieve a +27% improvement compared with the (SVM). Comparing with the other three graphbaed method, TriFG alo reult in an improvement of 2225%. The advantage of TriFG mainly come from the improvement on recall. One important reaon here i that TriFG can detect ome difficult cae by leveraging the tructural balance correlation and homophily correlation. For example, without conidering the two kind of ocial correlation, the performance of wtrifg decreae to 772% in term of FMeaure in the two tet cae. Another advantage of TriFG i that it make ue of the unlabeled data. Eentially, it further conider ome latent correlation in the data et, which cannot be leveraged with only the labeled training data. 5.3 Analyi and Dicuion Now, we perform everal analye to examine the following a F Meaure Tet Cae Tet Cae 2 TriFG TriFG B TriFG BI TriFG BIS TriFG BISL Figure 9: Factor contribution analyi. TriFGB tand for ignoring tructural balance correlation. TriFGBI tand for ignoring both tructural balance correlation and implicit network correlation. TriFGBIS tand for further ignoring tatu homophily and TriFG BISL tand for further ignoring link homophily. pect of the TriFG model: () contribution of different factor in the TriFG model; (2) convergence property of the learning algorithm; (3) Effect of different etting for the time pan; and (4) Effect of different algorithm for elite uer finding. Factor Contribution Analyi In TriFG, we conider five different factor function: Geographic ditance (G), Link homophily (L), Statu homophily (S), Implicit network correlation (I), and tructural Balance correlation (B). Here we examine the contribution of the different factor defined in our model. We firt rank the individual factor by their predictive power 6, then remove them one by one in revering order of their prediction power. In particular, we firt remove tructural balance correlation denoted a TriFGB, followed by further removing the implicit network correlation denoted a TriFGBI, tatu homophily denoted a TriFGBIS, and finally removing link homophily denoted a TriFGBISL. We train and evaluate the prediction performance of the different verion of TriFG. Figure how the average FMeaure core of the different verion of the TriFG model. We can oberve clear drop on the performance when ignoring each of the factor. Thi indicate that our method work well by combining the different factor function and each factor in our method contribute improvement in the performance. Convergence Property We conduct an experiment to ee the effect of the number of the loopy belief propagation iteration. Figure illutrate the convergence analyi reult of the learning algorithm. We ee on both tet cae, the BLPbaed learning algorithm can converge in le than iteration. After only even learning iteration, the prediction performance of TriFG on both tet cae become table. Thi ugget that learning algorithm i very efficient and ha a good convergence property. Effect of Time Span Figure 8 already how the ditribution of followback in different time tamp. Now, we quantitatively examine how different etting for the time pan will affect the prediction performance. Figure lit the average prediction performance of TriFG in the two tet cae with different etting of the time pan. It how that when etting the time pan a two or le time tamp, the prediction performance of TriFG drop harply; 6 We did thi by repectively removing each particular factor from our model and evaluated the decreae of the prediction performance by the TriFG model. A larger decreae mean a higher predictive power.
8 F Meaure Tet Cae Tet Cae #iteration Figure : Convergence analyi of the learning algorithm. F Meaure Tet Cae Tet Cae 2 t = t = 2 t = 3 t = 4 Figure : Followback prediction for different time tamp. while when etting it a three time tamp, the performance i acceptable. The reult are conitent with the tatitic in Figure 8: more than 9% of followback action are performed in the firt three time tamp, and only about 8% of the followback action are in the firt two time tamp. Effect of different algorithm for elite uer finding The tatu homophily factor depend on reult of elite uer finding. We ue three different algorithm, i.e., PageRank, #degree, and (α, β) algorithm, to find elite uer. Now we examine how the different algorithm would affect the prediction performance. Table 2 how the prediction performance of TriFG with different elite uer finding algorithm in the two tet cae. Interetingly, though TriFG with the (α, β) algorithm achieve the bet performance, the difference of performance among the three algorithm, epecially in the econd tet cae i not that pronounced (with a difference of %4% in term of Fmeaure core). Thi confirm the effectivene and generalization of incorporating the tatu homophily factor into our TriFG model. 5.4 Qualitative Cae Study Now we preent a cae tudy to demontrate the effectivene of the propoed model. Figure 2 how an example generated from our experiment. It repreent a portion of the Twitter network from the th3th time tamp. Black arrow indicate following link created 4 time tamp (we ue 4 time tamp a the time pan for prediction) before. Blue arrow indicate new following link in the pat 4 time tamp. Dah arrow indicate followback link in our data et (a), predicted by SVM (b), and predicted by our model TriFG (c), with green color denoting a correct one and red color denoting a mitake one. Red colored indicate there hould be a followback link, which the approach did not detect. We look at pecific example to tudy why the propoed model can outperform the comparion method. A, B, and C are three elite uer identified uing the (α, β) algorithm [8]. SVM correctly predict that there i a followback link from C to B, but mie predicting the followback link from C to A. Our model TriFG correctly predicted both the followback link. Thi i becaue TriFG leverage the tructural balance factor. The reulting tructure among the three uer by SVM i unbalanced. TriFG leverage the tructural balance factor and tend to reult in a balanced tructure. It i alo worth looking at the ituation of uer 9 and. TriFG made a mitake here: it doe not predict the followback link, while the link wa correctly predicted by SVM. Uer 9 and uer have a imilar ocial tatu (imilar indegree) and alo they are from the ame time zone, thu SVM uccefully predicted the followback link. However, a the reulting tructure i unbalanced, TriFG made a compromie and finally reulted in a mitaken prediction. 6. RELATED WORKS In thi ection, we review related work on link prediction and Twitter tudy in ocial network. Our work i related with link prediction, which i one of the core tak in ocial network. Exiting work on link prediction can be broadly grouped into two categorie baed on the learning method employed: unupervied link prediction and upervied link prediction. Unupervied link prediction uually aign core to potential link baed on the intuition  the more imilar the pair of uer are, the more likely they are linked. Variou imilarity meaure of uer are conidered, uch a the preferential attachment [2], and the Katz meaure [2]. A urvey of unupervied link prediction can be found in [7]. Recently, [8] deign a flow baed method for link prediction. There are alo a number of work which employ upervied approache to predict link in ocial network, uch a [28, 8, 2, 6]. Backtrom et al. [2] propoe a upervied random walk algorithm to etimate the trength of ocial link. Lekovec et al. [6] employ a logitic regreion model to predict poitive and negative link in online ocial network. The main difference between exiting work on link prediction and our work are about two apect. Firt, exiting work handle undirected ocial network, while we addre the directed nature of the Twitter network and predict a directed link between a pair of uer given an exiting link in the another direction. Secondly, mot exiting model for link prediction are tatic. In contrat, our model i dynamic and learned from the evolution of the Twitter network. Moreover, we combine ocial theorie (uch a homophily and tructural balance theory) into a emiupervied learning model. Another type of related work i ocial behavior analyi. Tang et al. [26] tudy the difference of the ocial influence on different topic and propoe Topical Affinity Propagation (TAP) to model the topiclevel ocial influence in ocial network and develop a parallel model learning algorithm baed on the mapreduce programming model. Tan et al. [25] invetigate how ocial action evolve in a dynamic ocial network and propoe a timevarying factor graph model for modeling and predicting uer ocial behavior. The propoed method in thee work can be utilized in the problem defined in thi work, but the problem i fundamentally different. There i little doubt that Twitter ha intrigued worldwide netizen, and the reearch communitie alike. Exiting Twitter tudy i mainly centered around the following three apect: ) the Twitter network. Java et al. [] tudy the topological and geographical propertie of the Twitter network. Their finding verify the homophily phenomenon that uer with imilar intention connect with each other. Kwak et al. [3] conduct a imilar tudy on the entire Twitterphere and they oberve ome notable propertie of Twitter, uch a a nonpowerlaw follower ditribution, a hort ef
9 239 / 24 Dec / 24 Dec / 24 Dec / 42 2 Dec / / 42 2 Dec / / 42 2 Dec / / 2 Dec.3 Dec.6 22 / 2 Dec.3 Dec.6 22 / 2 Dec.3 Dec / / / / / / / / / 39 A Following / Follower 263 / / 58 8 / 64 8 A Following / Follower 263 / / 58 8 / 64 8 A Following / Follower 263 / / 58 8 / 64 8 Oct / 9 Oct / 9 Oct / 9 Nov.3 4 Nov.3 4 Nov.3 4 Nov.3 24 / 976 Nov.3 24 / 976 Nov.3 24 / / 596 B Oct / 833 C Nov.3 78 / / 596 B Oct / 833 C Nov.3 78 / / 596 B Oct / 833 C Nov.3 78 / 75 3 (a) Ground Truth (b) SVM (c) Our approach (TriFG) Figure 2: Cae tudy. Portion of the Twitter network during the th3th time tamp. The two number aociated with each uer are repectively the number of followee and that of follower. Black arrow indicate following link created 4 time tamp (we ue 4 time tamp a the time pan for prediction) before. Blue arrow indicate new following link in the pat 4 time tamp. Dah arrow indicate followback link in our data et (a), predicted by SVM (b), and predicted by our model TriFG (c), with green color denoting a correct one and red color denoting a mitake one. Red colored indicate there hould be a followback link, which the approach did not predict. fective diameter, and low reciprocity, marking a deviation from known characteritic of human ocial network. 2) the Twitter uer. Work of thi category mainly focu on identifying influential uer in Twitter [3, 3, 3] or examining and predicting tweeting behavior of uer [, 25]. 3) the Tweet. Sakaki et al. [24] propoe to utilize the realtime nature of Twitter to detect a target event; while Mathioudaki and Kouda [9] preent a ytem, TwitterMonitor, to detect emerging topic from the Twitter content. 7. CONCLUSION In thi paper, we tudy the novel problem of twoway relationhip prediction in ocial network. We formally define the problem and propoe a Triad Factor Graph (TriFG) model, which incorporate ocial theorie into a emiupervied learning model. We evaluate the propoed model on a large Twitter network. We how the propoed factor graph model can ignificantly improve the performance (+22%+27% by FMeaure) for twoway relationhip prediction comparing with everal alternative method. Our tudy alo reveal everal intereting phenomena. The general problem of reciprocal relationhip prediction repreent a new and intereting reearch direction in ocial network analyi. There are many potential future direction of thi work. Firt, ome other ocial theorie can be further explored and validated for reciprocal relationhip prediction. Looking farther ahead, it i alo intereting to develop a real uggetion ytem baed on the propoed method. We can validate the propoed method baed on uer feedback. We can alo further tudy theoretical methodologie for improving the predictive performance by incorporating uer interaction. Finally, building a theory of why and how uer create relationhip with each other in different kind of network i an intriguing direction for further reearch. Acknowledgement John Hopcroft wa partially upported by the U.S. Air Force Office of Scientific Reearch under Grant FA Jie Tang i upported by the Natural Science Foundation of China (No ) and Chinee National Key Foundation Reearch (No , No.6354 ). Tiancheng Lou i upported in part by the National Baic Reearch Program of China Grant 27CB879, 27CB879, the National Natural Science Foundation of China Grant 633, 66354, REFERENCES [] L. Backtrom, R. Kumar, C. Marlow, J. Novak, and A. Tomkin. Preferential behavior in online group. In WSDM 8, page 7 28, 28. [2] L. Backtrom and J. Lekovec. Supervied random walk: predicting and recommending link in ocial network. In WSDM, page , 2. [3] M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi. Meauring uer influence in twitter: The million follower fallacy. In ICWSM, 2. [4] D. J. Crandall, L. Backtrom, D. Coley, S. Suri, D. Huttenlocher, and J. Kleinberg. Inferring ocial tie from geographic coincidence. PNAS, 7: , Dec. 2. [5] N. Eagle, A. S. Pentland, and D. Lazer. Inferring ocial network tructure uing mobile phone data. PNAS, 6(36), 29. [6] D. Ealey and J. Kleinberg. Network, Crowd, and Market: Reaoning about a Highly Connected World. Cambridge Univerity Pre, 2. [7] J. M. Hammerley and P. Clifford. Markov field on finite
10 graph and lattice. Unpublihed manucript, 97. [8] J. He, J. E. Hopcroft, H. Liang, S. Suwajanakorn, and L. Wang. Detecting the tructure of ocial network uing (α,β)communitie. In WAW, 2. [9] D. Horton and R. R. Wohl. Ma communication and paraocial interaction: Obervation on intimacy at a ditance. Pychiatry, page , 956. [] B. Huberman, D. M. Romero, and F. Wu. Social network that matter: Twitter under microcope. In Firt Monday, volume 4, page 8 38, 29. [] A. Java, X. Song, T. Finin, and B. L. Teng. Why we twitter: An analyi of a microblogging community. In WebKDD/SNAKDD, page 8 38, 27. [2] L. Katz. A new tatu index derived from ociometric analyi. Pychometrika, 8():39 43, 953. [3] H. Kwak, C. Lee, H. Park, and S. B. Moon. What i twitter, a ocial network or a new media? In WWW, page 59 6, 2. [4] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random field: Probabilitic model for egmenting and labeling equence data. In ICML, page , 2. [5] P. F. Lazarfeld and R. K. Merton. Friendhip a a ocial proce: A ubtantive and methodological analyi. M. Berger, T. Abel, and C. H. Page, editor, Freedom and control in modern ociety, New York: Van Notrand, page 8 66, 954. [6] J. Lekovec, D. Huttenlocher, and J. Kleinberg. Predicting poitive and negative link in online ocial network. In WWW, page 64 65, 2. [7] D. LibenNowell and J. M. Kleinberg. The linkprediction problem for ocial network. JASIST, 58(7):9 3, 27. [8] R. Lichtenwalter, J. T. Luier, and N. V. Chawla. New perpective and method in link prediction. In KDD, page , 2. [9] M. Mathioudaki and N. Kouda. Twittermonitor: trend detection over the twitter tream. In SIGMOD, page 55 58, 2. [2] K. P. Murphy, Y. Wei, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical tudy. In UAI 99, page , 999. [2] M. E. J. Newman. Clutering and preferential attachment in growing network. Phy. Rev. E, 64(2):252, 2. [22] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDLWP9992, Stanford Univerity, 999. [23] D. M. Romero and J. M. Kleinberg. The directed cloure proce in hybrid ocialinformation network, with an analyi of link formation on twitter. In ICWSM, 2. [24] T. Sakaki, M. Okazaki, and Y. Matuo. Earthquake hake twitter uer: realtime event detection by ocial enor. In WWW, page 85 86, 2. [25] C. Tan, J. Tang, J. Sun, Q. Lin, and F. Wang. Social action tracking via noie tolerant timevarying factor graph. In KDD, page 49 58, 2. [26] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analyi in largecale network. In KDD 9, page 87 86, 29. [27] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining advioradviee relationhip from reearch publication network. In KDD, page 23 22, 2. [28] C. Wang, V. Satuluri, and S. Parthaarathy. Local probabilitic model for link prediction. In ICDM 7, page , 27. [29] M. Weber. The Nature of Social Action in Runciman, W.G. Weber: Selection in Tranlation. Cambridge Univerity Pre, 99. [3] J. Weng, E.P. Lim, J. Jiang, and Q. He. Twitterrank: finding topicenitive influential twitterer. In WSDM, page 26 27, 2. [3] S. Wu, J. M. Hofman, W. A. Maon, and D. J. Watt. Who ay what to whom on twitter. In WWW, page 75 74, 2. [32] E. P. Xing, M. I. Jordan, and S. Ruell. A generalized mean field algorithm for variational inference in exponential familie. In UAI 3, page , 23. [33] Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su. Undertanding retweeting behavior in ocial network. In CIKM, page , 2. Appendix: Factor function definition Thi ection depict how we define the factor function in our experiment. In total, we define 25 feature of five categorie: Geographic ditance, Link homophily, Statu homophily, Structural balance, and Implicit network correlation. Geographic ditance We ue Google Map API to get the exact location (longitude and latitude) of ome uer. Baed on the two value, we define the following three feature : the abolute ditance and the time zone difference between two uer, and whether or not the two uer are from the ame country. Link homophily Firt, we treat each link a undirected link, and define the following four feature : the number of common neighbor, percentage of common neighbor of the two uer(repectively) and the average percentage. Then we conider directed link and define another three feature : the number of common twoway link, number of common follower and number of common followee. Statu homophily We alo tet whether two uer have imilar ocial tatu, and define the following three feature : whether or not the two uer are both elite uer, an ordinary and an elite, and both ordinary uer. Implicit network correlation We conider the interaction between uer A and uer B, and define the following four feature repectively repreent the number of retweet(replie) from A to B and fromb toa. Structural balance Baed on the tructural balance theory, a in Figure 5, we define eight feature capturing all ituation of tructural balance theory for each triad.
Asset Pricing: A Tale of Two Days
Aet Pricing: A Tale of Two Day Pavel Savor y Mungo Wilon z Thi verion: June 2013 Abtract We how that aet price behave very di erently on day when important macroeconomic new i cheduled for announcement
More informationIncorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors
via Dirichlet Foret Prior David ndrzeewi andrzee@c.wic.edu Xiaoin Zhu erryzhu@c.wic.edu Mar raven craven@biotat.wic.edu Department of omputer Science, Department of iotatitic and Medical Informatic Univerity
More informationMULTIPLE SINK LOCATION PROBLEM AND ENERGY EFFICIENCY IN LARGE SCALE WIRELESS SENSOR NETWORKS
MULTIPLE SINK LOCATION PROBLEM AND ENERGY EFFICIENCY IN LARGE SCALE WIRELESS SENSOR NETWORKS by Eylem İlker Oyman B.S. in Computer Engineering, Boğaziçi Univerity, 1993 B.S. in Mathematic, Boğaziçi Univerity,
More informationTwo Trees. John H. Cochrane University of Chicago. Francis A. Longstaff The UCLA Anderson School and NBER
Two Tree John H. Cochrane Univerity of Chicago Franci A. Longtaff The UCLA Anderon School and NBER Pedro SantaClara The UCLA Anderon School and NBER We olve a model with two i.i.d. Luca tree. Although
More informationSome Recent Advances on Spectral Methods for Unbounded Domains
COMMUICATIOS I COMPUTATIOAL PHYSICS Vol. 5, o. 24, pp. 195241 Commun. Comput. Phy. February 29 REVIEW ARTICLE Some Recent Advance on Spectral Method for Unbounded Domain Jie Shen 1, and LiLian Wang
More informationWarp Field Mechanics 101
Warp Field Mechanic 101 Dr. Harold Sonny White NASA Johnon Space Center 2101 NASA Parkway, MC EP4 Houton, TX 77058 email: harold.white1@naa.gov Abtract: Thi paper will begin with a hort review of the
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationCan cascades be predicted?
Can cascades be predicted? Justin Cheng Stanford University jcccf@cs.stanford.edu Jon Kleinberg Cornell University leinber@cs.cornell.edu Lada A. Adamic Faceboo ladamic@fb.com Jure Lesovec Stanford University
More informationFeedback Effects between Similarity and Social Influence in Online Communities
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall Dept. of Computer Science crandall@cs.cornell.edu Jon Kleinberg Dept. of Computer Science kleinber@cs.cornell.edu
More informationGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec Carnegie Mellon University jure@cs.cmu.edu Jon Kleinberg Cornell University kleinber@cs.cornell.edu Christos
More informationFind Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity
Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity Lars Backstrom lars@facebook.com Eric Sun esun@facebook.com 1601 S. California Ave. Palo Alto, CA 94304 Cameron Marlow
More informationPractical Lessons from Predicting Clicks on Ads at Facebook
Practical Lessons from Predicting Clicks on Ads at Facebook Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quiñonero Candela
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationCombating Web Spam with TrustRank
Combating Web Spam with TrustRank Zoltán Gyöngyi Hector GarciaMolina Jan Pedersen Stanford University Stanford University Yahoo! Inc. Computer Science Department Computer Science Department 70 First Avenue
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationNo Country for Old Members: User Lifecycle and Linguistic Change in Online Communities
No Country for Old Members: User Lifecycle and Linguistic Change in Online Communities Cristian DanescuNiculescuMizil Stanford University Max Planck Institute SWS cristiand@cs.stanford.edu Robert West
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationWho Says What to Whom on Twitter
Who Says What to Whom on Twitter ABSTRACT Shaomei Wu Cornell University, USA sw475@cornell.edu Winter A. Mason Yahoo! Research, NY, USA winteram@yahooinc.com We study several longstanding questions in
More information1 An Introduction to Conditional Random Fields for Relational Learning
1 An Introduction to Conditional Random Fields for Relational Learning Charles Sutton Department of Computer Science University of Massachusetts, USA casutton@cs.umass.edu http://www.cs.umass.edu/ casutton
More informationAre Automated Debugging Techniques Actually Helping Programmers?
Are Automated Debugging Techniques Actually Helping Programmers? Chris Parnin and Alessandro Orso Georgia Institute of Technology College of Computing {chris.parnin orso}@gatech.edu ABSTRACT Debugging
More informationRecovering Semantics of Tables on the Web
Recovering Semantics of Tables on the Web Petros Venetis Alon Halevy Jayant Madhavan Marius Paşca Stanford University Google Inc. Google Inc. Google Inc. venetis@cs.stanford.edu halevy@google.com jayant@google.com
More informationWhy We Twitter: Understanding Microblogging Usage and Communities
Why We Twitter: Understanding Microblogging Usage and Communities Akshay Java University of Maryland Baltimore County 1000 Hilltop Circle Baltimore, MD 21250, USA aks1@cs.umbc.edu Tim Finin University
More informationI Know Where You are and What You are Sharing:
I Know Where You are and What You are Sharing: Exploiting P2P Communications to Invade Users Privacy Stevens Le Blond Chao Zhang Arnaud Legout Keith Ross Walid Dabbous MPISWS, Germany NYUPoly, USA INRIA,
More informationYou Might Also Like: Privacy Risks of Collaborative Filtering
You Might Also Like: Privacy Risks of Collaborative Filtering Joseph A. Calandrino 1, Ann Kilzer 2, Arvind Narayanan 3, Edward W. Felten 1, and Vitaly Shmatikov 2 1 Dept. of Computer Science, Princeton
More informationParallel Analytics as a Service
Parallel Analytics as a Service Petrie Wong Zhian He Eric Lo Department of Computing The Hong Kong Polytechnic University {cskfwong, cszahe, ericlo}@comp.polyu.edu.hk ABSTRACT Recently, massively parallel
More informationUnderstanding and Combating Link Farming in the Twitter Social Network
Understanding and Combating Link Farming in the Twitter Social Network Saptarshi Ghosh IIT Kharagpur, India Naveen K. Sharma IIT Kharagpur, India Bimal Viswanath MPISWS, Germany Gautam Korlam IIT Kharagpur,
More informationAn efficient reconciliation algorithm for social networks
An efficient reconciliation algorithm for social networks Nitish Korula Google Inc. 76 Ninth Ave, 4th Floor New York, NY nitish@google.com Silvio Lattanzi Google Inc. 76 Ninth Ave, 4th Floor New York,
More informationCollective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map
Collective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map Francis HEYLIGHEN * Center Leo Apostel, Free University of Brussels Address: Krijgskundestraat 33,
More informationSocial Authentication: Harder than it Looks
Social Authentication: Harder than it Looks Hyoungshick Kim, John Tang, and Ross Anderson Computer Laboratory, University of Cambridge, UK {hk33, jkt27, rja4}@cam.ac.uk Abstract. A number of web service
More informationUs and Them: A Study of Privacy Requirements Across North America, Asia, and Europe
Us and Them: A Study of Privacy Requirements Across North America, Asia, and Europe ABSTRACT Swapneel Sheth, Gail Kaiser Department of Computer Science Columbia University New York, NY, USA {swapneel,
More information