Social Nfluence and Its Models

Size: px
Start display at page:

Download "Social Nfluence and Its Models"

Transcription

1 Influence and Correlaton n Socal Networks Ars Anagnostopoulos Rav Kumar Mohammad Mahdan Yahoo! Research 701 Frst Ave. Sunnyvale, CA {ars,ravkumar,mahdan}@yahoo-nc.com ABSTRACT In many onlne socal systems, socal tes between users play an mportant role n dctatng ther behavor. One of the ways ths can happen s through socal nfluence, the phenomenon that the actons of a user can nduce hs/her frends to behave n a smlar way. In systems where socal nfluence exsts, deas, modes of behavor, or new technologes can dffuse through the network lke an epdemc. Therefore, dentfyng and understandng socal nfluence s of tremendous nterest from both analyss and desgn ponts of vew. Ths s a dffcult task n general, snce there are factors such as homophly or unobserved confoundng varables that can nduce statstcal correlaton between the actons of frends n a socal network. Dstngushng nfluence from these s essentally the problem of dstngushng correlaton from causalty, a notorously hard statstcal problem. In ths paper we study ths problem systematcally. We defne farly general models that replcate the aforementoned sources of socal correlaton. We then propose two smple tests that can dentfy nfluence as a source of socal correlaton when the tme seres of user actons s avalable. We gve a theoretcal justfcaton of one of the tests by provng that wth hgh probablty t succeeds n rulng out nfluence n a rather general model of socal correlaton. We also smulate our tests on a number of examples desgned by randomly generatng actons of nodes on a real socal network (from Flckr) accordng to one of several models. Smulaton results confrm that our test performs well on these data. Fnally, we apply them to real taggng data on Flckr, exhbtng that whle there s sgnfcant socal correlaton n taggng behavor on ths system, ths correlaton cannot be attrbuted to socal nfluence. Categores and Subject Descrptors: J.4 [Computer Applcatons]:Socal and Behavoral Scences Socology General Terms: Economcs, Human Factors Keywords: Socal nfluence, Socal networks, Correlaton, Taggng Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. KDD 08, August 24 27, 2008, Las Vegas, Nevada, USA. Copyrght 2008 ACM /08/08...$ INTRODUCTION Onlne socal networks are playng an ever-mportant role n shapng the behavor of users on the web. Popular socal stes such as Facebook, MySpace, Flckr, and del.co.us, are enjoyng ncreasng traffc and are turnng nto communty spaces, where users nteract wth ther frends and acquantances. The avalablty of such rch data at never-before seen scales makes t possble to analyze user actons at an ndvdual level n order to understand user behavor at large. In partcular, questons nterpretng a user s acton n the context of hs/her onlne frends and correlatng the actons of socally connected users, become hghly nterestng. There has been some theoretcal and emprcal work on how a user s actons can be correlated to hs/her socal afflatons. Backstrom et al. [1] examned the membershp problem n an onlne communty. They observed correlaton between the acton of a user jonng an onlne communty and the number of frends who are already members of that communty. Marlow et al. [5] consdered the tag usage problem n Flckr and studed the set of tags placed by a user and those placed by the frends of the user. They exhbted a correlaton between socal connectvty and tag vocabulary. Whle these studes have establshed the exstence of correlaton between user actons and socal afflatons, they do not address the source of the correlaton. Causes of correlaton n socal networks can be categorzed nto roughly three types. The frst s nfluence (also known as nducton), where the acton of a user s trggered by one of hs/her frend s recent actons. An example of ths scenaro s when a user buys a product because one of hs/her frends has recently bought the same product. The second s homophly, whch means that ndvduals often befrend others who are smlar to them, and hence perform smlar actons. For example, two ndvduals who own Xboxes are more lkely to become frends due to the common nterest. The thrd s envronment (also known as confoundng factors or external nfluence), where external factors are correlated both wth the event that two ndvduals become frends and also wth ther actons. For example, two frends are lkely to lve n the same cty, and therefore to post pctures of the same landmarks n an onlne photo sharng system. From a practcal pont of vew, dentfyng stuatons where socal nfluence s the source of correlaton s mportant. In the presence of socal nfluence, an dea, norm of behavor, or a product dffuses through the socal network lke an epdemc. A marketng frm, for example, can use ths nformaton to desgn vral marketng campagns or gve out coupons to nfluental nodes n the network, or a system

2 desgner can take advantage of ths nformaton n order to nduce the users to follow a desred mode of behavor. There has already been sgnfcant research on methods for desgnng strateges to leverage socal nfluence n such systems [3] and on the effect of nfluence on the growth pattern of new products [8]. The man dea n all vral marketng strateges s essentally that n cases that nfluence between users s prevalent, careful targetng can have a cascadng effect on the adopton of a product/technology. Therefore, beng able to dentfy n whch cases nfluence prevals s an mportant step to strategy desgn. Our contrbutons. Gven the sgnfcance of socal nfluence, t s mportant to be able to test f a gven socal system exhbts sgns of socal nfluence. Ths s a partcularly dffcult problem n onlne settngs where ndvduals are often anonymous and therefore t s mpossble to control for all potental confoundng factors. We overcome ths problem by takng advantage of the avalablty of data about the tmng of actons n onlne settngs. We propose a statstcal test (called the shuffle test) based on the ntuton that f nfluence s not a lkely source of correlaton n a system, tmng of actons should not matter, and therefore reshufflng the tme stamps of the actons should not sgnfcantly change the amount of correlaton. We prove that n a rather general model of homophly and confoundng, ths test succeeds n rulng out nfluence as the source of socal correlaton. We also show the effectveness of our test usng smulatons. Our test cases are based on a large socal network from Flckr. We generate the acton data randomly from a model wth or wthout socal nfluence, and run our test on ths data set to decde whether the correlaton s caused by nfluence. Our results show that n nearly all cases our algorthm succeeds n dentfyng the source of correlaton. We also present results for another test (called the edge-reversal test) nspred by a recent study on the spread of obesty n real-world socal networks [2]. Fnally, we apply our algorthms on real taggng data n Flckr. Our results show that even though taggng behavor n ths system exhbts a consderable degree of socal correlaton, ths cannot be attrbuted to socal nfluence. Organzaton. In Secton 2 we detal the dfferent forms of socal correlaton. In Secton 3 we descrbe our methodology, and present a theoretcal analyss n a model of homophly and confoundng. We descrbe our data generaton models and present the results of smulatons n Secton 4. We descrbe our experments on Flckr tags n Secton MODELS OF SOCIAL CORRELATION We study a settng where a group of ndvduals (also called agents or users) are nodes of a socal network G. In general, G s a drected graph and s generated from an unknown probablty dstrbuton. We are concerned wth ndvduals performng a certan acton for the frst tme, e.g., purchasng a product, vstng a web-page, or taggng a photo wth a partcular tag. 1 After an agent performs the acton, we say that the agent has become actve. We observe 1 In many cases, e.g., purchasng certan products or usng certan tags, an ndvdual mght perform the acton multple tmes. We focus on the frst tme the acton s performed by each ndvdual, snce subsequent occurrences of the same acton by the same ndvdual s often more dependent on the frst occurrence than on the socal network. the system for a certan tme perod [0, T ]. Let W denote the set of agents that are actve at the end of ths tme perod. Socal correlaton,.e., correlaton between the behavor of afflated agents n a socal network s a well-known phenomenon. Formally, ths means that for two nodes u and v that are adjacent n G, the events that u becomes actve s correlated wth v becomng actve. There are three prmary explanatons for ths phenomenon: homophly, the envronment (or confoundng factors), and socal nfluence. Homophly. Homophly s the tendency of ndvduals to choose frends wth smlar characterstcs [4, 6]. Ths s a pervasve phenomenon, and not surprsngly, leads to correlaton between the actons of adjacent nodes n a socal network. For example, one plausble hypothess for why there s socal correlaton n membershp n an onlne communty s that ndvduals mght know each other and become frends after jonng the communty. Mathematcally, n a pure homophly model, the set W of actve nodes s frst selected accordng to some dstrbuton, and then the graph G s pcked from a dstrbuton that depends on W. Confoundng. The second explanaton for correlaton between actons of adjacent agents n a socal network s external nfluence from elements n the envronment (also referred to as confoundng factors), whch are more lkely to affect ndvduals that are located close to each other n the socal network. Mathematcally, ths means that there s a confoundng varable X, and both the network G and the set of actve ndvduals W come from dstrbutons correlated wth X. For example, two ndvduals who lve n the same cty are more lkely to become frends than two random ndvduals, and they are also more lkely to take pctures of smlar scenery and post them on Flckr wth the same tag. Note that there s a fne dstncton between ths explanaton and homophly: homophly refers to stuatons where the set W affects ndvduals choces to become frends, whle n confoundng, both the choces of ndvduals to become frends and ther choce to become actve are affected by the same unobserved varable. It s possble to dstngush between these models by lookng at the tme where the edges of G are establshed. The focus of ths paper, however, s on dstngushng socal nfluence from other types of socal correlaton. Therefore, we study a common generalzaton of the confoundng and the homophly model as follows: frst, the par (G, W ) s selected accordng to a jont probablty dstrbuton, and then the tme of actvaton for ndvduals n W s pcked..d. accordng to a dstrbuton T on [0, T ]. We call ths model the correlaton model. The man assumpton here s that the probablty that an ndvdual s actve can be affected by whether ther frends become actve, but not by when they become actve. Ths s n contrast wth the nfluence model, as defned below. Influence. The thrd, and perhaps the most consequental explanaton for socal correlaton s socal nfluence. Ths refers to the phenomenon that the acton of ndvduals can nduce ther frends to act n a smlar way. Ths can be through settng an example for ther frends (as n the case of fashon), nformng them about the acton (as n the case of vral marketng), or ncreasng the value of an acton for them (as n the case of adopton of a technology). Mathematcally, ths can be modeled as follows: frst, the graph G s drawn accordng to some dstrbuton. Then, n each of the tme steps 1,..., T, each non-actve agent decdes whether

3 to become actve. The probablty of becomng actve for each agent u s a functon p(x) of the number x of other agents v that have an edge to u and are already actve. 2 Here, p( ) can be any ncreasng functon, although later n the paper we consder a specal class of functons that provdes a good ft wth the real data and also corresponds to a commonly used statstcal model for estmatng the probablty of bnary events, namely the logstc regresson. 3. METHODOLOGY In ths secton we present the methodology that we use to measure socal correlaton and test whether nfluence s a source of such correlaton. We start n Secton 3.1 by explanng how logstc regresson can be used to quantfy the extent of socal correlaton. In Secton 3.2 we defne the shuffle test for decdng f nfluence s a lkely source of correlaton, and prove that ths test successfully rules out nfluence as the source of correlaton n the correlaton (confoundng/homophly) model defned n Secton 2. Fnally, n Secton 3.3 we defne another test called the edge-reversal test, whch we evaluate expermentally. 3.1 Measurng socal correlaton The frst step n our analyss s to obtan a measure of socal correlaton between the actons of an ndvdual and that of her frends n the network. Ths measure s desgned to recover the actvaton probablty, assumng that the agents follow the nfluence model defned n Secton 2. Recall that n the nfluence model, each ndvdual flps an ndependent con n every tme step to decde whether or not to become actve. In prncple, the probablty of ths con can vary from agent to agent and from tme to tme; n the smplest model, whch s the focus of most of ths paper, we measure ths probablty as a functon of only one varable: the number of already-actve frends the agent has. 3 Note that the parameter we use s the number of frends that have become actve at any earler tme step, as opposed to frends who have become actve mmedately before. Ths s because n onlne systems lke Flckr actons are stored, and mght be observed by others much later. As t turns out, for most tags n the Flckr data set, a logstc functon wth the logarthm of the number of frends as the explanatory varable provdes a good ft for the probablty. Therefore, for smplcty and to reduce the possblty of overfttng, we use the logstc functon wth ths varable, that s, we estmate the probablty p(a) of actvaton for an agent wth a already-actve frends as follows: 4 p(a) = ln(a+1)+β eα, (1) 1 + eα ln(a+1)+β 2 Ths model assumes that tme progresses n dscrete steps. A smlar model wth contnuous tme can be defned usng the Posson dstrbuton. 3 We also consdered usng the fracton of the total populaton that s actve as another explanatory varable n our estmaton on the Flckr data set, but the results ndcated that ths parameter s of no value: the correspondng coeffcent s nsgnfcant for almost all tags. 4 We have also duplcated some of our experments usng a as the explanatory varable. The results are not qualtatvely dfferent, and almost always the lkelhood of the ft s better wth the logarthmc varable. where α and β are coeffcents. Equvalently, ( ) p(a) ln = α ln(a + 1) + β. (2) 1 p(a) The coeffcent α measures socal correlaton: a large value of α ndcates a large degree of correlaton. We estmate α, β usng maxmum lkelhood logstc regresson. More precsely, let Y a,t be the number of users who at the begnnng of tme t had a actve frends and started usng the tag at tme t. Smlarly, let N a,t be those users who at tme t were nactve, had a actve frends, but dd not start usng the tag (at tme t). Fnally, let Y a = t Ya,t, and Na = t Na,t. Then we compute the values of α and β that maxmze the expresson p(a) Ya (1 p(a)) Na, (3) a where p(a) s defned n (1). Typcally, the values of Y a and N a decrease quckly and lose ther statstcal sgnfcance as a grows. Therefore, for practcal reasons, we may restrct the lkelhood expresson (3) to only all a R, for a carefully chosen value of R, whle we accumulate all the values correspondng to a > R to Y R+1 and N R+1. Whle n general there s no closed form soluton, there are many software packages that can solve such a problem qute effcently; we used Matlab s statstcs toolbox n our experments. 3.2 The shuffle test In ths secton we ntroduce the shuffle test for dentfyng socal nfluence. It s based on the dea that f nfluence does not play a role, even though an agent s probablty of actvaton could depend on her frends, the tmng of such actvaton should be ndependent of the tmng of other agents. Let G be the socal network, and W = {w 1,..., w l } be the set of users that are actvated durng the perod [0, T ]. Recall that n the correlaton model, (G, W ) s drawn from an arbtrary jont dstrbuton. Assume that user w s frst actvated at tme t. Usng the method n Secton 3.1, we compute Y a and N a, for a R, where R s a constant, and use the maxmum lkelhood method to estmate α. Next, we create a second problem nstance wth the same graph G and the same set W of actve nodes, by pckng a random permutaton π of {1,..., l}, and settng the tme of actvaton of node w to t := t π(). Agan we use the method n Secton 3.1 to compute Y a and N a for a R, and the socal correlaton coeffcent α. The shuffle test declares that the model exhbts no socal nfluence f the values of α and α are close to each other. Intutvely, the reason that the shuffle test correctly rules out socal nfluence n nstances generated accordng to the correlaton model s the followng: n an nstance generated from ths model, the tme stamps t are ndependent, dentcally dstrbuted (..d.) from a dstrbuton T over [0, T ]. The second nstance constructed above only permutes all tme stamps, and hence the new t s are stll..d. from the same dstrbuton T. Therefore, the two nstances come from the exact same dstrbuton, and hence they should lead to the same expected socal correlaton coeffcent α. The only thng that remans to be proven s that ths coeffcent s concentrated around ts expectaton (where the expectaton s taken over the random choce of the tme stamps, condtonng on a fxed choce of G and W ). In the next secton, we formalze ths ntuton, leadng to Theorem 1.

4 3.2.1 Theoretcal analyss To ad our analyss, we make three smplfyng assumptons. Frst, we assume that the dstrbuton T of the actvaton tmes s unform over [0, T ]. Second, we modfy the test to pck each t ndependently from T, nstead of usng a permutaton of the orgnal tme stamps. Nether of these assumptons s necessary, but t smplfes the arguments wthout substantvely changng the technques. The thrd set of assumptons ensures that there are enough data to gather statstcs. Let d (d + ) be the ndegree (outdegree) of node w, and let (d W + ) be the ndegree (outdegree) of node w n the subgraph nduced by W (recall that W s the set of users that became actve). Also, let W = {w 1,..., w l }, where l l be the set of nodes n W and ther neghbors (note that the frst l nodes are those n W ). Then we make the followng assumptons: 1. l = Θ(n). 2. d, d+ d max, for l and for some constant d max. 3. { : d W R + 1} = Θ(n). These assumptons are not the strctest possble for our results to hold, but they are nevertheless qute natural and smple to state. In partcular, we make the frst assumpton only to smplfy the notaton (otherwse the results hold wth probabltes that depend on l and l nstead of n). Theorem 1. Let G = (V, E) be a drected graph on n nodes and let W = {w 1,..., w l } V be the set of nodes that become actve durng the tme perod [0, T ]. Assume that the actvaton tme t of the node w s pcked..d. from the unform dstrbuton over {1,..., T }, and assume that the three assumptons hold. Let α denote the socal correlaton coeffcent computed usng the method n Secton 3.1. Then, wth hgh probablty 5 the value of α s close to ts expectaton, where the probabltes are over random choces of the actvaton tmes. Proof. The man part of the proof s Lemma 2 where we show that the values of Y a and N a are concentrated. Ths s proved usng concentraton nequaltes for martngales. We can then show (detals deferred for the full verson of the work) that when we apply logstc regresson wth nputs that are close to each other, the socal correlaton values α recovered are also close to each other. Therefore, wth hgh probablty the value of α recovered s close to ts expectaton whp. Lemma 2. Assume the condtons of Theorem 1, and let Y a and N a, a R + 1, defned as n Secton 3.1. Then we have that Y a and N a are close to ther expectatons whp. Proof. Frst we calculate E[Y a], for a fxed a. We ntroduce some notaton. Let Ya = 1 f when node w used the tag had a actve neghbors and 0 otherwse. Notce that we have Y a = r =1 Y a. The probablty that exactly a of the neghbors are actve when node w used a tag s 0 f < a. Otherwse, f a R, ths probablty s 1/( + 1), snce node w and ts neghbors have the same probablty to be the ath node among them that used the 5 The term wth hgh probablty, abbrevated whp., refers to an event that holds wth probablty that tends to 1 as n. tag. Fnally, f a = R + 1 (recall that R + 1 corresponds to the ensemble of all the values greater than R), then the probablty s ( R)/( + 1). Thus, we have E[Y a] = for a R, and E[Y a] = l =1 l =1 E[Y a ] = E[Y a ] = : a : R , R + 1, for a = R + 1. One can verfy that from our assumptons we have that both of these quanttes are Θ(n). Note that the terms are not ndependent. Thus, to show concentraton, we wll employ Azuma s nequalty [7]. For a fxed a we defne the (Doob s) martngale X = E[Y a t 1, t 2,..., t ]. We have that X 0 = E[Y a] and X l = Y a. Note that we have that X X 1 d W + + 1, snce a node affects only tself the nodes for whch t s a contact. Then Azuma s nequalty mples that Pr( Y a E[Y a] > λ) = Pr( X l X 0 > λ) 2e λ 2 2 (d W + +1) 2, whch s o(1) for λ = ω( n). To compute the value of E[N a] we have to be a bt more careful, snce a node can contrbute multple tme perods to N a. Frst, note that we have to count also the neghbors of the nodes n W. Recall that W = {w 1,..., w l }, s the set of actve nodes and ther neghbors. Let us wrte N a = l =1 N a, where Na counts the number of tmesteps before node w becomng actve (f at all) and had exactly a actve contacts. Let us compute E[Na], frst for l. Of course, ths equals 0 f < a. Otherwse, the expected tme untl one of the +1 nodes (w and ts contacts) becomes actvated s T/( + 2), thus E[N0] = T ( + 2). Wth probablty /( + 1) the frst node s not w, hence we have E[N 1] = More generally we get that for a R, and dw E[Na] = dw a + 1 T , E[Na] = dw R + 1 d W + 1 R T, 2 T for a = R + 1. (The frst fracton s the probablty that w becomes actvated after R + 1 neghbors, and then t s expected to arrve n the mddle of the leftover perod.) For > l we can show wth smlar arguments that E[Na] = 0 f < a, otherwse for a R, and E[N a] = T + 1, E[N a] = R T,

5 for a = R + 1. By our assumptons for the graph we have that N a = l =1 N a = Θ(T n). Agan we show concentraton by usng the Azuma nequalty. We defne Z = E[N a t 1, t 2,..., t ], and notce that we have Z Z 1 T (d + + 1), wth the same reasonng as prevously. So we get that λ 2 2 Pr( N a E[N a] > λ) = Pr( Z r Z 0 > λ) 2e T 2 (d + +1)2, whch s o(1) for λ = ω(t n) Detectng nfluence We showed that the values of α that we obtan wth the correlaton model are close to each other wth hgh probablty wth and wthout the tmestep shuffle. Now we contrast ths wth the nfluence model and we show that n the latter case the values of α that we compute wth and wthout the tmestep shuffle are n general dfferent. We demonstrate ths fact wth a smple example. Consder a lne graph wth n + 1 nodes, v 0, v 1, v 2,..., v n, and edge set the {(v, v +1); = 0, 2,..., n 1}. For smplcty we assume that that node v 0 s has ntally used a tag; ths does not change the nature of our example. For some p [0, 1], consder now the nfluence model wth α = log 2 (p/(1 p)) and β = 0, and we observe the system for T tme steps (wth T p beng suffcently small, say T p < n/2). Durng the T steps, the nodes wll start to use the tags from left to rght, and at each step, the probablty that the leftmost nactve node wll become actve equals p. Then at the end of the T steps, f the number of new actve nodes s denoted by L, we have E[Y 1] = E[L] = T p and E[N 1] = T (1 p). Assume now that we perform the shuffle test. Then for = 1,..., L, let Y1 be 1 f node v became actve after node v 1, and N1 the number of tme steps that node v dd not become actve although node v 1 was (0 f node v 1 became actve after node v ). Then we have Y 1 = L =1 Y 1 and E[Y 1] = E[L/2] = T p/2, snce the probablty that node v 1 becomes actve before node v s 1/2. Smlarly, N 1 = L =1 N 1 and E[N v 1 ] = T =1 ( ) 1 T T p 2 1. Ths follows snce node v becomes actve at tme step wth probablty 1/T, the probablty that node v 1 arrves before s /T p and n that case the arrval tme s unformly dstrbuted n [0, ] so the expected number of tmes that node v does not become actve s /2 1. Therefore, and so, E[N v 1 ] = E[N 1] = (T + 1)(2T 5), 12T p (T + 1)(2T 5). 12 Hence we see that the nput to the regresson functon s n general very dfferent and as a result the values of α wll n general be very dfferent. 3.3 The edge-reversal test In ths secton we ntroduce the second test for dstngushng nfluence smlar to the one used n the obesty study [2]: we reverse the drecton of all the edges and run logstc regresson on the data usng the new graph (whch we call the reverse graph) as well 6. Snce other forms of socal correlaton (other than socal nfluence) are only based on the fact that two frends often share common characterstcs or are affected by the same external varables and are ndependent of whch of these two ndvduals has named the other as a frend, we ntutvely expect reversng the edges not to change our estmate of the socal correlaton sgnfcantly. On the other hand, socal nfluence spreads n the drecton specfed by the edges of the graph, and hence reversng the edges should ntutvely change the estmate of the correlaton. We wll test ths hypothess on several classes of nstances generated usng probablstc models of dfferent forms of socal correlaton. 4. SIMULATIONS 4.1 Generatve models To verfy the valdty of the technques descrbed n Secton 3, we defne three generatve models one correspondng to a settng where there s no socal correlaton, one correspondng to a settng that there s only socal nfluence and one that there s socal correlaton but not nfluence. In each model, we wll try to keep other aspects of the model as close to Flckr s data as possble. In partcular, n all models the network (both number of users and connectons) grows at the same rate as n the real Flckr data, and we wll try to let the number of users that become actve n each tme step to follow the pattern correspondng to a tag n the real data. The frst model concerns a settng where there s no socal correlaton nfluence or otherwse n the pattern of actvatons. The second model s for a settng where nfluence s the only form of socal correlaton; ths model s defned to match the logstc regresson model descrbed earler. The thrd model seeks to capture stuatons where agents that are close to each other n the network are affected by the same external factors (the envronment) that make them more lkely to be actvated. We now descrbe the models. The no-correlaton model. For every tag n the real data, we can generate a no-correlaton nstance as follows: the network grows exactly n the same way as n the real data. In each tme step, we look at the real data to see how many new agents use the tag, and pck the same number of agents unformly at random from the set of agents that have already joned the network and have not been pcked yet. The nfluence model. Ths model s parameterzed n terms of two parameters, α and β. The network, and the growth pattern of the network s kept as n the real data. In every tme step, each node n the set of nodes that has joned the network but not actvated yet flps a con ndependently to decde f to become actve n ths tme step. The probablty of actvaton for ths node s computed usng (2), where a s the number of frends of ths node that have become actve n one of the prevous tme steps. 6 Note that we are only able to use ths test because n Flckr data set, a sgnfcant number of edges are drected.

6 (a) Hstogram. (b) Emprcal CDF. (a) Hstogram. (b) Emprcal CDF. Fgure 1: model. Dstrbuton of α for the no-correlaton Fgure 2: Dstrbuton of α for the nfluence model. The correlaton (no-nfluence) model. Agan, we keep the network and the pattern of growth of the populaton the same as n the real data. The model s parameterzed n terms of one parameter L, and follows the pattern of a gven tag n the real data. Before generatng the acton data, we select a set S of nodes by sequentally pckng a number of centers at random, and addng a ball of radus 2 around each to S. 7 We stop ths process as soon as the sze of S reaches the prespecfed number L. Then, we generate the set of agents that become actve n each tme step n a manner smlar to the one n the no-correlaton model, except that n each tme step we pck the set of agents to become actve unformly at random from S. 4.2 Measurng correlaton Our frst set of experments focuses on the measurement of correlaton n the network. In Fgure 1 we dsplay the results of the applcaton of logstc regresson to the no-correlaton model. We can see that the dstrbuton of the values of α s centered at zero and most of the mass s around there. In Fgure 2 we can see the applcaton of the logstc regresson to the nfluence model. Recall from Secton 3 that ths model s based on the logstc functon, whch we are tryng to ft. Not surprsngly, we recover the values of α that we set n our model. Thus, Fgure 2 essentally dsplays those values of α. Fnally, n Fgure 3 we see the results n the correlaton model. Note that here as well the values of α that we recover are postve. 4.3 Dstngushng nfluence After establshng the presence of correlaton n users behavor, we turn to tests for the source of ths correlaton. Frst we apply the shuffle test and then we turn to the edgereversal test Shuffle test Let us frst observe the nfluence model, where the values of α wth the orgnal taggng tmes are hgh. From the ntuton ganed n Secton 3.2, we expect to see those values to decrease, when we shuffle the taggng tmesteps. In 7 We have chosen a radus of 2 here snce because the network s hghly connected, a ball of radus 3 can become very large, whle a ball of radus 1 only conssts of the neghbors of a node, whch s often too small. (a) Hstogram. (b) Emprcal CDF. Fgure 3: Dstrbuton of α for the correlaton model. Fgure 4(a) we can observe the results for some of the tags. Notce how the cumulatve densty functon (CDF) s shfted to the left, whch means that when we reverse the edges the value of α decreases. In Fgure 4(b) we can see the values n absolute terms. Now we swtch to the correlaton model. Accordng to the analytcal fndngs of Secton 3.2, the values of α that we obtan wth and wthout the shufflng should not dffer wth hgh probablty. Fgure 5 confrms our analytcal fndngs and shows that for almost all tags the values of α retreved are very close wth and wthout the shuffle Edge-reversal test Now we present the results of our second nfluence-detecton test, the edge reversal, confrmng the results of the prevous secton. Frst we apply t to the nfluence model, depctng the results n Fgure 6. Smlarly to the prevous test, there s a sgnfcant dfference n the values of α n the forward and backward drecton. On the contrary, n the correlaton model, as seen n Fgure 7, the values of α essentally concde. In Fgure 7(a) we can notce that the two CDFs essentally concde. In Fgure 7(b) we see a more detaled pcture. Here every pont corresponds to a tag, and the graph shows the value of α n the network versus the value of α n the network wth the edges reversed. Take notce of the proxmty of the ponts to the lne y = x.

7 (a) Emprcal probablty densty. (b) α of orgnal and shuffled taggng tmesteps. Fgure 4: Shuffle test for the nfluence model. (a) Emprcal probablty densty. (b) α of orgnal and shuffled taggng tmesteps. Fgure 5: Shuffle test for the correlaton model. (a) Emprcal probablty densty. (b) α of drect vs. edges. reversed Fgure 6: Edge-reversal test for the nfluence model. (a) Emprcal probablty densty. Fgure 7: model. (b) α of drect vs. edges. reversed Edge-reversal test for the correlaton 5. EXPERIMENTS ON REAL DATA After verfyng that our technques are effectve for the smulated data, we apply them on real-world data, namely on the Flckr socal network. Frst we descrbe the data set. Then we show that there s postve correlaton n the users behavor. Fnally, we address the ssue of the source of correlaton. We apply the tests of Secton 3, and we conclude that nfluence s not a lkely source of the correlaton. 5.1 The Flckr dataset We analyzed the taggng behavor of users for a perod of 16 months. The fnal number of users was about 800K. Snce the majorty of users dd not exhbt any taggng behavor at all, we restrcted our attenton to the set of users who have tagged any photo wth any tag, whch s about 340K users. Lookng at ths subgraph at the end of the 16-month perod, the sze of the gant component s 160K users, the second one has sze 16, and there are 165K solated users. The number of drected edges between the users s 2.8M and, on the average, for a gven user u, the proporton of u s contacts that do not have u as a contact s 28.5%. In Fgure 8 we depct the sze of the subgraph that we analyze as a functon of tme. (The growth rate of the entre network exhbts a very smlar behavor.) Out of a collecton of about 10K tags that users had used, we selected a set of 1, 700, and analyzed each of them ndependently. We selected tags of varous types (event, colors, objects, etc.), varous numbers of users (most of them were used by more than 1, 000 users), and varous growth patterns: bursty (e.g., halloween, katrna ), smooth (e.g., photos, ) and perodc (e.g., moon ). 5.2 Measurng correlaton Frst we confrm the exstence of correlaton n the Flckr data set as expected. In Fgure 9 we can see the dstrbuton of α along the tags of Flckr. Note that for almost all the tags the value s hgher than 1, suggestng that correlaton s prevalent n users taggng actvtes for almost all the tags. Ths correlaton s not necessarly due to socal nfluence; we examne ths ssue next. 5.3 Dstngushng nfluence After establshng the presence of correlaton n users behavor, we turn to the test for the source of ths correlaton.

8 (a) Emprcal probablty densty. (b) α of orgnal tmesteps vs. shuffled tmesteps. Fgure 10: Shuffle test for the Flckr socal network. Fgure 8: Growth of the Flckr network. (a) Emprcal probablty densty. (b) α of drect vs. edges. reversed Fgure 11: network. Edge-reversal test for the Flckr socal (a) Hstogram. (b) Emprcal CDF. Fgure 9: Dstrbuton of α for the Flckr socal network. Frst we apply the shuffle test and then we turn to the edgereversal test. In Fgure 10 we show the results of applyng the shuffle test on the Flckr data set. In Fgure 10(a), notce that the two cumulatve dstrbuton functons essentally concde. It seems that the correlaton that we observed n Secton 5.2 cannot be attrbuted to nfluence. Ths ndcates that ether users do not tend to browse ther contacts photos to a large extent, or even when they browse, they do not tend to start usng the tags they see. In Fgure 10(b) we see more detals. Once agan, every pont corresponds to a tag, and the graph shows the value of α n the Flckr network versus the value of α n the network wth the edges reversed. As before, notce the strkng proxmty of the ponts to the lne y = x. Fnally, n Fgure 11 we observe the results of applyng the edge-reversal test to the Flckr network, whch once agan confrms all our prevous observatons. 5.4 Some nfluence n Flckr Whle t s true that nfluence does not play an mportant role n users taggng behavor n Flckr, we can actually dscover that there s some lmted effect by lookng at the dfference between smlar tags. As a concrete example, consder the tag grafft ; the dfference between the values of α n the two edge drectons s essentally 0. A lot of users used the msspelled tag graftt. Here the dfference turns out to

9 be slghtly larger (stll small though). It s easy to magne that ndeed there s some propagaton of the msspelled verson. (The analogy wth the TA who grades two homeworks wth the same mstakes should make ths concept clear!) Fnally, wth a thrd, even less common spellng ( grafftt ), the dfference ncreased yet more. 6. CONCLUSIONS In ths paper we appled statstcal analyss on the data from a large socal system n order to dentfy and measure socal nfluence as a source of correlaton between the actons of ndvduals wth socal tes. Ths s an nstance of the age-old problem of dstngushng correlaton from causaton. Ths problem s very dffcult n general; however, n our case, we used the avalablty of data about the tme-step of each acton, as well as asymmetrc socal tes between the agents n order to study ths problem. There are stll many nterestng open drectons left for future research. Frst, our technques provde only a qualtatve ndcaton of the exstence of nfluence and not a quanttatve measure. Furthermore, we do not provde any formal verfcaton of our results. For example, s t ndeed the case that n Flckr users taggng behavor, nfluence has a lmted role? Or, can we pnpont socal networks and behavors where nfluence s ndeed prevalent and verfy our tests? Also, what happens when dfferent sources of socal correlaton are present, as s usually the case? All these mportant questons mght be trcky to answer and probably requre the desgn of controlled user experments. Furthermore, t would be very nterestng to extend our theoretcal model for dstngushng between socal nfluence and other forms of correlaton n socal networks. Under what condtons the nformaton about the tme step of events s enough to acheve ths goal? How can the pattern of the spread of an acton be used to dentfy socal nfluence even n a settng where all socal tes are symmetrc? How can we fnd an nfluental node just by lookng at the data about the spread of an acton? Gven the great potental of vral marketng technologes to shape the future of marketng on the Internet, ths and many other related questons are of tremendous practcal value. Acknowledgments We thank Alex Jaffe, Malcolm Slaney, and Duncan Watts for nvaluable dscussons, as well as the anonymous revewers for nsghtful comments. 7. REFERENCES [1] L. Backstrom, D. Huttenlocher, J. Klenberg, and X. Lan. Group formaton n large socal networks: Membershp, growth, and evoluton. In 12th KDD, pages 44 54, [2] N. A. Chrstaks and J. H. Fowler. The spread of obesty n a large socal network over 32 years. The New England Journal of Medcne, 357(4): , [3] D. Kempe, J. Klenberg, and E. Tardos. Maxmzng the spread of nfluence through a socal network. In 9th KDD, pages , [4] P. Lazarsfeld and R. K. Merton. Frendshp as a socal process: A substantve and methodologcal analyss. In M. Berger, T. Abel, and C. H. Page, edtors, Freedom and Control n Modern Socety, pages Van Nostrand, [5] C. Marlow, M. Naaman, D. Boyd, and M. Davs. Ht06, taggng paper, taxonomy, Flckr, academc artcle, to read. In 17th HYPERTEXT, pages 31 40, [6] M. McPherson, L. Smth-Lovn1, and J. M. Cook. Brds of a feather: Homophly n socal networks. Annual Revew of Socology, 27: , [7] M. Mtzenmacher and E. Upfal. Probablty and Computng. Cambrdge Unversty Press, [8] P. Young. The dffuson of nnovatons n socal networks. In L. E. Blume and S. N. Durlauf, edtors, The Economy as a Complex Evolvng System, volume III. Oxford Unversty Press, 2003.

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia To appear n Journal o Appled Probablty June 2007 O-COSTAT SUM RED-AD-BLACK GAMES WITH BET-DEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT

AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT 1 AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT Nkos Salamanos, Ev Alexogann, Mchals Vazrganns Department of Informatcs, Athens Unversty of Economcs and Busness salaman@aueb.gr,

More information

Understanding the Impact of Marketing Actions in Traditional Channels on the Internet: Evidence from a Large Scale Field Experiment

Understanding the Impact of Marketing Actions in Traditional Channels on the Internet: Evidence from a Large Scale Field Experiment A research and educaton ntatve at the MT Sloan School of Management Understandng the mpact of Marketng Actons n Tradtonal Channels on the nternet: Evdence from a Large Scale Feld Experment Paper 216 Erc

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul

More information

Combinatorial Agency of Threshold Functions

Combinatorial Agency of Threshold Functions Combnatoral Agency of Threshold Functons Shal Jan Computer Scence Department Yale Unversty New Haven, CT 06520 shal.jan@yale.edu Davd C. Parkes School of Engneerng and Appled Scences Harvard Unversty Cambrdge,

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

HÜCKEL MOLECULAR ORBITAL THEORY

HÜCKEL MOLECULAR ORBITAL THEORY 1 HÜCKEL MOLECULAR ORBITAL THEORY In general, the vast maorty polyatomc molecules can be thought of as consstng of a collecton of two electron bonds between pars of atoms. So the qualtatve pcture of σ

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio Vascek s Model of Dstrbuton of Losses n a Large, Homogeneous Portfolo Stephen M Schaefer London Busness School Credt Rsk Electve Summer 2012 Vascek s Model Important method for calculatng dstrbuton of

More information

Generalizing the degree sequence problem

Generalizing the degree sequence problem Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Using Series to Analyze Financial Situations: Present Value

Using Series to Analyze Financial Situations: Present Value 2.8 Usng Seres to Analyze Fnancal Stuatons: Present Value In the prevous secton, you learned how to calculate the amount, or future value, of an ordnary smple annuty. The amount s the sum of the accumulated

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil * Evaluatng the Effects of FUNDEF on Wages and Test Scores n Brazl * Naérco Menezes-Flho Elane Pazello Unversty of São Paulo Abstract In ths paper we nvestgate the effects of the 1998 reform n the fundng

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting Propertes of Indoor Receved Sgnal Strength for WLAN Locaton Fngerprntng Kamol Kaemarungs and Prashant Krshnamurthy Telecommuncatons Program, School of Informaton Scences, Unversty of Pttsburgh E-mal: kakst2,prashk@ptt.edu

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Gender differences in revealed risk taking: evidence from mutual fund investors

Gender differences in revealed risk taking: evidence from mutual fund investors Economcs Letters 76 (2002) 151 158 www.elsever.com/ locate/ econbase Gender dfferences n revealed rsk takng: evdence from mutual fund nvestors a b c, * Peggy D. Dwyer, James H. Glkeson, John A. Lst a Unversty

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines. A statstcal approach to determne Mcrobologcally Influenced Corroson (MIC) Rates of underground gas ppelnes. by Lech A. Grzelak A thess submtted to the Delft Unversty of Technology n conformty wth the requrements

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Covariate-based pricing of automobile insurance

Covariate-based pricing of automobile insurance Insurance Markets and Companes: Analyses and Actuaral Computatons, Volume 1, Issue 2, 2010 José Antono Ordaz (Span), María del Carmen Melgar (Span) Covarate-based prcng of automoble nsurance Abstract Ths

More information

Network Formation and the Structure of the Commercial World Wide Web

Network Formation and the Structure of the Commercial World Wide Web Network Formaton and the Structure of the Commercal World Wde Web Zsolt Katona and Mklos Sarvary September 5, 2007 Zsolt Katona s a Ph.D. student and Mklos Sarvary s Professor of Marketng at INSEAD, Bd.

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Transition Matrix Models of Consumer Credit Ratings

Transition Matrix Models of Consumer Credit Ratings Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss

More information

Rate-Based Daily Arrival Process Models with Application to Call Centers

Rate-Based Daily Arrival Process Models with Application to Call Centers Submtted to Operatons Research manuscrpt (Please, provde the manuscrpt number!) Authors are encouraged to submt new papers to INFORMS journals by means of a style fle template, whch ncludes the journal

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses Student Performance n Onlne Quzzes as a Functon of Tme n Undergraduate Fnancal Management Courses Olver Schnusenberg The Unversty of North Florda ABSTRACT An nterestng research queston n lght of recent

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

7.5. Present Value of an Annuity. Investigate

7.5. Present Value of an Annuity. Investigate 7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

The EigenTrust Algorithm for Reputation Management in P2P Networks

The EigenTrust Algorithm for Reputation Management in P2P Networks The EgenTrust Algorthm for Reputaton Management n P2P Networks Sepandar D. Kamvar Stanford Unversty sdkamvar@stanford.edu Maro T. Schlosser Stanford Unversty schloss@db.stanford.edu Hector Garca-Molna

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

The impact of hard discount control mechanism on the discount volatility of UK closed-end funds

The impact of hard discount control mechanism on the discount volatility of UK closed-end funds Investment Management and Fnancal Innovatons, Volume 10, Issue 3, 2013 Ahmed F. Salhn (Egypt) The mpact of hard dscount control mechansm on the dscount volatlty of UK closed-end funds Abstract The mpact

More information

Framing and cooperation in public good games : an experiment with an interior solution 1

Framing and cooperation in public good games : an experiment with an interior solution 1 Framng and cooperaton n publc good games : an experment wth an nteror soluton Marc Wllnger, Anthony Zegelmeyer Bureau d Econome Théorque et Applquée, Unversté Lous Pasteur, 38 boulevard d Anvers, 67000

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

Automating Analysis of Large-Scale Botnet Probing Events

Automating Analysis of Large-Scale Botnet Probing Events Automatng Analyss of Large-Scale Botnet Probng Events Zhchun L, Anup Goyal and Yan Chen Northwestern Unversty 2145 Sherdan Road Evanston, IL, USA {lzc,ago210,ychen}@cs.northwestern.edu Vern Paxson UC Berkeley

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

Faraday's Law of Induction

Faraday's Law of Induction Introducton Faraday's Law o Inducton In ths lab, you wll study Faraday's Law o nducton usng a wand wth col whch swngs through a magnetc eld. You wll also examne converson o mechanc energy nto electrc energy

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information