Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups

Size: px
Start display at page:

Download "Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups"

Transcription

1 Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups Lujun Fang University of Michigan 2260 Hayward Avenue Ann Arbor, MI ljfang@umich.edu Alex Fabrikant Google Research 1600 Amphitheatre Pkwy Mountain View, CA fabrikant@google.com Kristen LeFevre Google Research 1600 Amphitheatre Pkwy Mountain View, CA klefevre@google.com ABSTRACT Online social networks like Google+, Twitter, and Facebook allow users to build, organize, and manage their social connections for the purposes of information sharing and consumption. Nonetheless, most social network users still report that building and curating contact groups is a time consuming burden. To help users overcome the burdens of contact discovery and grouping, Google+ recently launched a new feature known as circle sharing. The feature makes it easy for users to share the benefits of their own contact curation by sharing entire circles (contact groups) with others. Recipients of a shared circle can adopt the circle as a whole, merge the circle into one of their own circles, or select specific members of the circle to add. In this paper, we investigate the impact that circle-sharing has had on the growth and structure of the Google+ social network. Using a cluster analysis, we identify two natural categories of shared circles, which represent two qualitatively different use cases: circles comprised primarily of celebrities (celebrity circles), and circles comprised of members of a community (community circles). We observe that exposure to circle-sharing accelerates the rate at which a user adds others to his or her circles. More specifically, we notice that circle-sharing has accelerated the densification rate of community circles, and also that it has disproportionately affected users with few connections, allowing them to find new contacts at a faster rate than would be expected based on accepted models of network growth. Finally, we identify features that can be used to predict which of a user s circles (s)he is most likely to share, thus demonstrating that it is feasible to suggest to a user which circles to share with friends. Author Keywords Social Network, Google+, Circle Sharing ACM Classification Keywords H.5.3 Information Interfaces and Presentation: Group and Organization Interfaces Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WebSci 2012, June 22 24, 2012, Evanston, Illinois, USA. Copyright 2012 ACM $ General Terms Experimentation. INTRODUCTION Every day, hundreds of millions of users enjoy sharing and consuming information using online social network sites. At the same time, it can be difficult for users to discover new contacts and to maintain contact groupings (e.g., Google+ circles, Facebook friend lists)[21, 7]. Most contact management solutions focus on only one of these two tasks. A significant amount of research focuses on link prediction, which can be used to recommend new contacts to social network users [17, 14]. These recommendations are often made based on the user s existing connections, which means that they are less accurate for new users (the cold-start problem). Moreover, link prediction algorithms usually generate one recommendation at a time. On the other hand, contact grouping is notoriously difficult for users [21]. A number of data mining and machine learning approaches have been proposed and built to automatically group contacts [1, 4, 10], but none of them generates satisfactory user groups without user involvement. Further, existing tools typically cannot detect real-life communities until many of the community s interconnections are already captured in the online system [13]. As a result, new users and users of nascent social networks are often forced to manually curate and populate lists to capture the natural groupings among their contacts. In September 2011, Google+ launched a circle-sharing tool, which allows users to share their individual circles (i.e., contact groups) with other users [3]. A screenshot of the circlesharing tool is shown in Figure 1. Recipients of a shared circle can copy the circle as-is, merge the circle into one of their existing circles, or cherry-pick people from the circle to add to their own circles. In this paper, we provide a large-scale data-driven examination of the impact that circle-sharing has had on the Google+ social network, including a characterization of the usage patterns that have driven this impact. Our main contributions are the following: We observe that shared circles can be categorized into two distinct types: communities and celebrities. Based

2 Twitter lists, and friend suggestion tools provided by each of these sites). At the same time, finding and organizing one s contacts on a social networking site are still difficult tasks, largely due to the complex and faceted nature of users online social spheres [20, 12]. Figure 1. Screenshot of the circle-sharing tool. on structural features of the circles themselves, we use clustering techniques to discover two predominant clusters of shared circles, which correspond to intuitive and qualitatively different use cases. Circles in the first large cluster ( communities ) are characterized by high withincircle link density, high link reciprocity with the circle owner, and relatively low popularity among circle members. Circles in the second large cluster ( celebrities ) are characterized by low within-circle link density, low link reciprocity with the circle owner, and very high popularity among circle members. We provide the first large-scale study of the impact of contact-group sharing on the structure and growth of a social network. Past research (e.g., [15]) has observed that the features and prevailing use cases of a social networking site can have a substantial effect on the growth patterns and structure of the resulting network graph. Our results demonstrate that the circle-sharing feature accelerates the densification of community-type circles. We also observe that circle-sharing alleviates the cold start problem of link prediction; if circle-sharing is prevalent in a user s social neighborhood, this allows low-degree users to discover new contacts at a much faster rate than would be expected based on accepted models of network growth. We demonstrate the feasibility of algorithmically recommending circles that a user should share. We indentify features that can differentiate shared circles from ordinary circles (i.e., those created by users for personal use, but never shared with others). In particular, we show that shared circles are more commonly useful than ordinary circles. Using this characterization, we can recommend circles that are good candidates for sharing. RELATED WORK AND BACKGROUND Fully 65% of online adults are using social networking sites [6], and Facebook alone has over 800 million active users [2]. One of the prevailing purposes of a social networking site is to allow users to add and group their contacts for the purpose of information sharing and consumption. Almost all major social networking sites provide tools to help users find and group contacts (e.g., Google+ circles, Facebook user lists, A large body of prior work has focused on identifying and recommending potential contacts for social network users; most existing techniques involve viewing the social network as a graph (i.e., users as nodes and connections between users as edges) and recommending new edges in the graph based on existing edges in the graph [17, 14]. Such recommenders usually do not capture the underlying relationships between the recommendations. For example, although a recommender may find some of Alice s high school friends, it could not group them together and recommend the group as a whole. There are indeed some group recommendation algorithms [9], however they view group memberships as features of social network users, and make recommendations about which groups to join, rather than recommendations of adding a group of users as contacts. The other limitation of such recommenders is that they fail to provide good recommendations for users who have few existing connections. The recommenders are very dependent on the target user s existing connections. Therefore, it is often difficult for new users to find contacts. However, the cold start problem of new social networking users is not solely because of the ineffectiveness of contact recommenders. Social network researchers have established theoretically [8, 19] and experimentally [18] preferential attachment of social network edge creation process: new edges are more likely to be connected to users of large degrees than those of small degrees. For example, under the BA model of network growth [8], a social networking user with 100 contacts is 10 times more likely to add another contact before a social networking user with 10 contacts. Google+ circle sharing tools help high-degree users to share their connections with low-degree users, potentially alleviating the cold start problem for new (low-degree) users. There is also a body of literature about automatic groupcreation algorithms, which can be used to assist users with grouping contacts [1, 4, 10]. Unfortunately, each of these techniques requires user involvement to create final groupings, and the group creation process is isolated from membership suggestion. User list creation through crowdsourcing is also a possible solution if the members in the lists are all public figures [5]. However, this technique is less applicable to personalized local communities (e.g., families). In contrast, as we will demonstrate in our paper, the Google+ circle sharing tool can be successfully used for both celebrity circles (circles containing popular and public figures) and community circles (circles containing members of a local community or group). Finally, past research has observed that the network structure articulated by users of an online social network is often influenced by the features of the social network service and predominant use cases. For example, Kwak et al. observed 2

3 that the structure of the Twitter network is qualitatively different from other social networks, likely due to prevailing use cases (celebrity following and news consumption) [15]. Similarly, we observe that the Google+ circle-sharing feature has had a quantifiable impact on network growth and structure. Google+ Circle Sharing Feature In Google+, a user can create circles reflecting different facets in her social life. Each Google+ user has four default circles: friends, family, acquaintances and following. A user can also create other circles to describe other aspects of her life. If a user U A puts another user U B into any of her circles, then we say that U A is following U B. Connections on Google+ can be asymmetric (i.e., U A is following U B does not imply that U B is following back U A ). The circle sharing feature launched in Google+ in September This feature allows users to share their circles with other users. A user can choose to share any of her circles, and she can choose with whom she wants to share the circles. When a user notices that another user has shared a circle with her, she can decide to add some or all of the members in the shared circle as her own contacts. She can either add those members to one of her existing circles, or create a new circle for them. OVERVIEW OF THE ANALYSES The analyses presented in this paper are intended to answer three key questions: (1) Are there different types of shared circles, and how can we identify them? (2) What is the impact of circle-sharing on the structure and growth of the Google+ social network?, and (3) Can we recommend to users which of their circles are suitable to be shared? Data Overview All of our analyses are performed based on a large anonymized sample of Google+ circles and their adjacent edges. For each circle, we use identities of the person who shared it, the members of the shared circle, and the time of the circle share, We also use times when each node (member) joined Google+, and the circle membership edges in the network at the time of the study, along with the times the edges were created. All the user and circle IDs involved were then anonymized, and all other information on node and circle identities was scrubbed from the dataset before the study. For each different analysis, we sampled a subset of these circles according to the requirements of the analysis; details of the sampling are provided for each analysis. All analyses are based on at least 5,000 circles. Analysis Road Map In order to understand the impact of circle-sharing, it is first important to understand how people are utilizing the circlesharing feature (i.e., which circles they are sharing). We start by describing a clustering analysis. The analysis discovers two large categories of shared circles: communities and celebrities. Both categories of shared circles play an important role in the latter analyses. Then, we move on to study the effect that circle sharing has had on the growth and structure of the Google+ social graph. Using aggregated statistics about edge creation times, we demonstrate that sharing both types of circles accelerates the growth of the social network. We also observe that circlesharing accelerates the densification of community-type circles. Finally, we develop a model to distinguish shared circles from ordinary circles (i.e., circles that are not shared). One possible use for such a model is to recommend to users which of their circles are good candidates for sharing. We identify a feature called commonality which is predictive of a circle being shared. Using commonality as well as some other features, we investigate the feasibility of classifying circles as shared or not shared. We observe that sharing of community circles is more easily predicted than sharing of celebrity circles. CATEGORIZING SHARED CIRCLES In this section, we describe a cluster analysis with the goal of identifying different types of shared circles. Based on our analysis, we identify two large clusters of shared circles: those that contain primarily celebrities, and those that contain communities, or groups of people who are socially interconnected. Methodology The shared-circle cluster analysis is based on a random sample of 9000 shared circles with size 10. We use standard clustering techniques to group these circles on the basis of several key features. Some of the features (e.g., density) can be derived from understood features of communities in symmetric social networks [16, 13], while some features (e.g., reciprocity, popularity) are unique to asymmetric networks. Recall that Google+ connections can be asymmetric. Intuitively, the members of some of a user s circles (e.g., the user s Cousins or Book Club circles) are more likely to follow the user back than the members of other circles (e.g., the user s Music Stars circle). To capture the extent to which the users in a circle follow back the circle s owner, we define the reciprocity feature of a circle. DEFINITION 1. Reciprocity The reciprocity of a circle is defined as the proportion of the circle members who follow back the circle owner. EXAMPLE 1. Figure 2 describes a social network of 4 users: Alice, Bob, Claire, and Dan. Suppose that users have the circles shown, with an edge from A to B indicating that B is in some of A s circles. Alice s circle K contains Bob, Claire and Dan. The reciprocity of K is 1/3 = 0.33, since among the members of K, only Bob follows Alice. To better understand the reciprocity feature, for all of the shared circles in our sample, we compute their reciprocities and plot the probability density function for their reciprocities (Figure 3(a)). 1 It is interesting to observe that this dis- 1 Note that probability density at a given point can be larger than 1. 3

4 (a) Reciprocity. (b) Density. (c) Popularity. Figure 3. Probability density distributions of different circle features. Bob Claire Finally, we define the popularity of a circle based on the number of people who are following the circle s members. DEFINITION 3. Popularity The popularity of a user is defined as the in-degree (i.e., the number of followers) of the user in the social network. The popularity of a circle is defined as median popularity of its members. K Alice Dan Figure 2. An example social network of 4 users. Each user has exactly one circle, and circle memberships are represented by outgoing edges. tribution is heavily bimodal; in other words, shared circles tend to have either high or low reciprocity. In addition to the owner, the individual members of a circle can be connected to one another. For example, we would expect members of a family circle to be well connected to one another. To capture the degree to which the members of a circle are interconnected, we define the density feature of a circle. DEFINITION 2. Density The density of a circle is defined as the actual number of bi-directional edges between circle members divided by the maximum possible number of bidirectional edges (i.e., n(n 1) 2 if the size of the circle is n). EXAMPLE 2. In Figure 2, among members of Alice s circle K, there is only one bi-directional edge Bob Claire. The maximum possible number of such edges is = 3, so the density of K is 1 3. Figure 3(b) shows the probability density function of circle density, as measured from our sample of shared circles. The function reaches its peak at 0.1, although there are indeed circles with density of 1, indicating existences of fully connected circles. EXAMPLE 3. In Figure 2, K s members have popularities 3 (Bob), 3 (Claire), and 1 (Dan). The popularity of K is thus 3 (the median of {1, 3, 3}). Note that we use median instead of mean of member popularities as the circle popularity because the distribution of individual popularity is very heavy-tailed: a few users have upward of millions of followers, but most have a modest number, which would make the mean popularity dominated by a circle s most popular members. Figure 3(c) shows the probability density distribution of circle popularity, measured using our sample of shared circles. We observe that the function reaches its peak around 200, although there are still a significant number of circles with very high popularity (e.g., >1000). There are undoubtedly other features (besides reciprocity, density, and popularity) that are useful for characterizing circles. Circle name is another logical feature to consider. For example, we would expect a circle named Family to represent a community (with high density and high reciprocity); we would expect a circle named Following to include a set of celebrities (with low reciprocity and high popularity). Unfortunately, in many cases, the circle name alone is insufficient. For example, a circle named Photographer could represent a community or a group of celebrity photographers; in order to distinguish the two cases, we would end up looking at the structure of the network graph. For these reasons, the remainder of our analysis focuses on structural features, but future work could, with appropriate privacy safeguards, incorporate semantic signals from circle names, circle-share post annotations, and more sophisticated signals of user engagement with the circles. 4

5 Figure 4. Within clusters sum-of-squares for different k when performing k-means circle clustering. Circle Clustering Using reciprocity, popularity, and density as features, we applied a standard clustering technique (k-means) to the shared circles in our sample. Of course, circles (as well as their feature values) change over time, and we use the feature values at the time when each circle was shared. Before clustering, we pre-processed the data in two ways: (1) Because the popularity value is heavily skewed, we transform this feature by taking its log. (2) We applied the scale() function in R to normalize each of the features. As a second preliminary step, we computed the within-clusters sum-of-squares for different possible values of k (k = 2..15), and selected k = 4 by visually observing the natural knee in the trend plot [11] of the within-cluster sum-of-squares, in Figure 4. The result of clustering based on the processed features is shown in Figure 5. Each triple of bars represent the mean processed feature values of a circle cluster. Since the feature values are normalized, the numbers in the figure indicate a feature s relative, rather than absolute, value. The aggregate results of real feature values (after reversing the normalization) are shown in Table 1. The first two clusters of circles are of high reciprocity and relatively low popularity, indicating that members of those circles are most likely to be ordinary users who are friends with the circle owners, and the circles are very likely to describe real life communities like families or groups of friends. Therefore we call them community circles. We also notice that the circles in Cluster 1 are more dense than those in Cluster 2, which suggests that some community circles have been well-developed, while others are still nascent. These two clusters of circles combined comprise 52% of all shared circles. In contrast, the circles in Clusters 3 and 4 are of high popularity and low reciprocity. This is in particular true for circles Figure 5. Shared circle clustering using k-means (k = 4) algorithm. in Cluster 4; their median popularity is more than 20000, and the mean reciprocity is only We call circles in these two clusters celebrity circles, since they mostly contain famous (i.e., high in-degree ) people, and the connections to them are mostly single-directional. It is interesting to observe that celebrity circles, especially those in Cluster 4, have moderate densities. This suggests that some of those celebrities are connected to each other. Circles in Clusters 3 and 4 comprise the remaining 48% of all the shared circles. Cluster Reciprocity Density Popularity ID (mean) (mean) (median) Table 1. Aggregated statistics of circle clusters. IMPACT OF SHARED CIRCLES In this section, we now turn our attention to understanding the impact that the Google+ circle-sharing feature has had on the growth and structure of the network. We describe a large-scale quantitative study, the results of which are the following important observations: Circle-sharing events accelerate the creation of edges in the network. In particular, we find that circle-sharing events accelerate the densification of community circles. We hypothesize that circle-sharing accelerates the popularity of celebrities, but we are not able to confirm this hypothesis for reasons described in detail below. We find that circle-sharing disproportionately accelerates the growth of edges involving low-degree users. After being exposed to a shared circle, the degrees of low-degree users increase at a rate higher than predicted by accepted models of network growth. 5

6 Among users who are exposed to shared circles, circlesharing accelerates the rate at which circles are created, and the rate at which new people are added to circles. Methodology To understand how circle-sharing events have affected circles and users, we identify important circle- and user-related metrics (e.g., the density of a circle), and measure their values before and after the circle or user is affected by the circle-sharing feature (we will define what we mean by affected for each analysis). Since each circle (user) is affected by circle-sharing at a different time, to summarize the changes of multiple circles (users), we group circles (users) in our dataset into cohorts according to the week in which they are affected by circle sharing. For each cohort of circles (users), we can then measure the changes in these metrics over time to understand the effect of circle sharing. Accelerating Edge Growth We start by investigating whether and how circle-sharing events affect the speed at which new edges are added to the social graph. Intuitively, we expect that when a circle is shared, it will draw the attention of other users (the recipients of the shared circle) to its members. As a result, we expect that the number of people following the circle members (in-edges) will increase very quickly soon after the circle is shared. In addition to accelerating edge growth overall, we also hypothesize that circle-sharing events will affect the network differently, depending on whether the shared circle is a community or celebrity circle. Specifically, anecdotal evidence suggests that community circles (e.g., the Knitting Club circle) are often shared with users who are also members of the community. Thus, we suspect that circle-sharing will contribute to the densification of the community, as members adopt the shared circle. In contrast, we expect that shared celebrity circles (e.g., the Rock Stars circle) will serve primarily to accelerate the popularity of circle members. To verify these hypotheses, we use the same sample of shared circles as in the previous section, first categorizing them into community and celebrity circles, and then dividing them into cohorts based on the week during which they were shared. Density increase of community circles In the previous section we defined circle density. However, the density of a circle at any point in time is dependent not only on the number of edges in the circle, but also on the number of members in the circle. In order to reason about changes in density due to edge growth, in the following analyses, in this section we use density to specifically refer to the density of edges among a circle s members at a globally-fixed date shortly before the beginning of the study period. For each weekly cohort of community circles, we compute their mean density over time and plot the trend. Figure 6(a) shows the density trend over time of the circle cohort C Nov2 (i.e., circles shared during the week of November 2-8). We notice that, aside from week of November 2, the growth of circle density is mostly linear. However, during the week when the circle sharing events happen, we notice an obvious jump in circle density. The same observation also holds for other weeks. To better understand the density increase trend and the acceleration of density increase during the circle-sharing week, we compute the density increase for each week, and compare the weekly density increase of the circle sharing-week to that of other weeks. The weekly density increase value D w (c) of a circle c for timestamp w, expressed in weeks, is defined by: D w (c) = D w+1.0 (c) D w (c). (1) Based on weekly density increases, we compute the sharingweek acceleration rate R D, which captures the amount of density increase during the week when the circle got shared, w c (rounded to the beginning of the week), as compared to the previous week: R D (c) = D w c (c) D wc 1.0(c), (2) The mean R D for all the shared circles in our sample is 2.5. In other words, the mean density acceleration is 150% during the week when the circle is shared. Finally, we perform a one-sample t-test to see if the density increase during the circle sharing week is significantly better than other weeks. We computed the p-value for each circle cohort separately, and the density increase acceleration brought by circle-sharing was statistically significant with p < 0.05 for all weeks. Impact on popularity in celebrity circles We also performed a similar analysis to test the hypothesis that circle-sharing accelerates the popularity in celebrity circles. We see anecdotal evidence that in some cases, circle-sharing events are helping celebrity circles attract a significant number of new followers. However, our analysis did not show such a growth with statistical significance. One possible explanation is that celebrity circles usually attract hundreds or thousands of followers, while circles are often shared with smaller groups of people. Even if the circle owner shares it publicly, the impact of the action is likely mostly limited to those who follow the sharer. Thus, while the circle-sharing event may bring in new edges, the total number of new edges is likely to be small in comparison to the number of users already following the celebrities. Nonetheless, multiple shares of the same circle of celebrities can attract larger audiences, and a closer look at the impact of being included in many shared circles is an interesting topic for future research. Structural Impact on Edge Growth So far we have demonstrated how circle-sharing events are accelerating the network growth in term of edge additions, but we also want to see if circle-sharing events are affecting the structural properties of the network. Most social network growth exhibits a phenomenon called preferential attach- 6

7 (a) Circle density. (b) Number of circles. (c) Circle Size. Figure 6. Mean values of various circle metrics, for users who became circle-sharing-touched (Figure 6(b) and 6(c)) or for circles got shared (Figure 6(a)) during the week of November 2 8. The beginning and end of the circle sharing week are indicated by the dashed lines. (The y-axis has been descaled to protect proprietary information.) ment [8, 19, 18]; new edges are more likely to be connected to large-degree nodes than smaller-degree nodes. The circlesharing feature makes it easier for both low-degree and highdegree users to discover groups of contacts, and low degree users might even benefit more since they may find more new contacts from a shared circle. Therefore we expect the difference between edge growth rates for low- and high-degree users becomes smaller as a result. To test this hypothesis, we chose a random sample of users who were members of a circle that got shared, and divided them into cohorts based on the number of bidirectional edges they had before the relevant circle sharing event, and measured, for 3 example cohorts, the change in the number of new bidirectional edges created the week before the relevant circle share, and the week after. The results, with the degree change figures descaled, are shown in Table 2. As we have seen in the previous analysis, all the users can benefit from circle sharing in terms of making new connections. However, this is particularly true for low-degree users. During the week immediately after circle-sharing events, users of degree 10 make 1.63 times more connections than they did the week before. In contrast, users of degree of 100 make 1.07 times as many as the week before. Before circles are shared, users of degree 100 add 4 times as many connections as users of degree 10. After circle-sharing events, users of degree 100 only add 2.6 times as many connections as users of degree 10. Therefore, circle sharing is indeed changing the network growth process by giving low degree users better chances to make new connections. Degree when shared Weekly link creations before share Weekly link creations after share Link creation acceleration ratio Table 2. Degree of user vs. new bidirectional link creations per week before and after a circle-sharing event. (The six weekly link creation rate averages are rescaled to protect proprietary information.) Circle Creation and Expansion of Recipients Next we examine whether and how shared circles are adopted or used by their recipients. Upon seeing a shared circle, if the recipient decides to add some or all of the contacts in the shared circle, she has two choices: add the contacts to one of her existing circles, or create a new circle for the contacts. To verify the adoption of these two types of shared circleadoption behaviors, we select groups of users that are recipients of shared circles and see if they are expanding their existing circles and creating new circles as a result of seeing shared circles. (Note that data about shared-circle uptake events was not available, so we had to observe these behaviors indirectly by observing changes in circle sizes and changes in the number of circles owned by a user.) To perform the analyses, we randomly sampled users that became circle-sharing-touched between September and December, We say a user becomes circle-sharingtouched if the user shares a circle or is a member of a shared circle. There are other ways to define circle-sharing-touched, but our main goal is to isolate a set of users that are likely to have been recipients of a shared circle. Let w(u) denote the timestamp, in weeks, of when user u was first touched by circle-sharing, rounded down to the beginning of the calendar week to define weekly cohorts. Number of circles owned per user. We first compute the mean number of circles owned by different cohorts of users over time. If users are adopting shared circles they see and creating new circles for them, then we would expect the mean number of circles owned by users to increase faster when the users become circle-sharing-touched. For each user cohort, we compute the mean number of circles owned by the users over time; we show the trend of one example weekly cohort (those who became circle-sharing-touched during the week of November 2 8) in Figure 6(b). We see that users create more circles during the week they become circle-sharing-touched. (Similar observations can be made for other groups, but are omitted for space.) 7

8 Following the same process we used in the previous analysis for circle density, we compute the weekly increase in circle count C: C w (u) = C w+1.0 (c) C w (c), and then compute sharing week acceleration rate as: R C (u) = C w(u)(u) C w(u) 1.0 (u). The mean R C (u) for all selected users is 2.2. A one-sample t-test showed that users create statistically significantly more circles after getting touched by circle sharing, with p < 0.05 for each weekly cohort separately. This is a strong indication that these users are creating new circles based on the shared circles they see. Mean circle size. Finally, we measure the mean sizes of circles owned by each cohort of users, before and after the owners become circle-sharing-touched. If users are adopting the shared circles they see by adding all or some members of the shared circle into their existing circles, then we would expect the mean size of existing circles owned by users to increase more quickly when the users become touched by circle sharing. For each user group, we compute the mean size of the associated circles over time and show the trend of the example cohort (first touched by circle sharing during the week of Nov 2) in Figure 6(c). We see that circles expand faster during the week when their owners first became circlesharing-touched. (Again, the same observations are true for other user groups.) Similar to the circle count case, we also compute the sharing week acceleration rate for circle size increase and compute the p-values for statistical significance test. The mean acceleration rate for circle size is 1.9, and all of the p-values for different cohorts are below These results demonstrate that users expand their existing circles faster when they become circle-sharing-touched, which is a strong indication that users are adding contacts to existing circles from the shared circles they see. RECOMMENDING CIRCLES TO SHARE With the impact of circle sharing events in mind, in this section, we focus our efforts on distinguishing shared circles from ordinary circles (i.e., those that do not get shared). Ultimately, this has interesting applications, including recommending to the user which of his circles are good candidates for sharing. We identify a quantitative feature of circles, which we call commonality, and we demonstrate how commonality can be used to recommend circles for sharing. One of the main goals of circle sharing it to let other users reuse all or part of a shared circle to create similar circles. Thus, intuitively, we expect that the circles that are the best candidates for sharing are those that are of common interest, or useful to many people. Following this reasoning, we suspect that if many users have already constructed the same circle (or a circle containing a very similar set of people), then that is a good indication that the circle is a prime candidate for sharing. Following this intuition, we define a property of a circle c called commonality, which summarizes the extent to which other users have constructed a circle that is similar to c. Before describing the details of the commonality definition, we first define the co-existance probability of two users to capture the frequency with which two users co-occur in the same circles. In particular, the co-existance probability of two users is defined as the average conditional probability that having one user in a circle would result the other user also being in the same circle 2. EXAMPLE 4. Consider the social network in Figure 2. Claire is in 3 circles and Dan is in 1 circle. They co-occur in 1 circle. The conditional probability that a circle including Claire would also include Dan is 1/3 = 0.33, the conditional probability that a circle including Dan would also include Claire is 1/1 = 1. Therefore, the co-existence probability of Claire and Dan is ( )/2 = Based on the co-existence probability of two users, we can then define commonality as follows: DEFINITION 4. (Global) Commonality The commonality of a circle is defined as the average co-existence probability, taken over all pairs of users in the circle. If there exist many other circles (created by other users) that are similar to circle c, the we expect c to have high commonality; otherwise, it should have low commonality. Since we consider circles owned by all social network users when computing the co-existence conditional probability, we also call it global commonality (in analogy to local commonality, which we will define later). EXAMPLE 5. Consider the social network in Figure 2. Alice s circle contains three pairs of users: Bob and Claire with co-existence probability of 1, Bob and Dan with coexistence probability of 0.67, Claire and Dan with co-existence probability of Therefore, the commonality of Alice s circle is ( )/3 = To compare shared circles and ordinary circles, we randomly selected 9000 shared circles and 9000 ordinary circles. To make sure the owners of ordinary circles are aware of the option of circle sharing, when sampling the ordinary circles, we only consider circles owned by a user who has shared at least one circle. Using this data set, we compute the probability density function of global commonality, for both shared circles and ordinary circles (Figure 7(a)). As expected, shared circles tend to have higher global commonality than ordinary circles. Note that global commonality considers all social network users circles when computing co-existence probabilities. We suspect that this will be less meaningful for community circles, since the members of a community circle are likely to be of interest to only a small subset of the social network s users. (On the other hand, members of celebrity circles tend 2 For consistency, we assume the circle owners are also in their own circles when computing co-existence probability. 8

9 (a) GlobalCommonality. (b) LocalCommonality. (c) Reciprocity. Figure 7. A comparison of shared and ordinary circles based on the probability density function of different features. to be of more global interest.) To capture this intuition, we define local commonality as follows: DEFINITION 5. Local Commonality The local commonality of a circle is defined as the average co-existence probability (considering only those circles owned by members of the given circle), computed over all pairs of users in the circle. EXAMPLE 6. Consider the social network in Figure 2, and imagine there is an additional user Eva who has Bob, Claire and Dan in her circle. When computing the local commonality for Alice, Eva s circle would be ignored since Eva is not in Alice s circle; however in the case of global commonality, Eva s circle would be considered. We show the probability density functions of local commonality for both shared and ordinary circles in Figure 7(b). Similar to the global commonality case, shared circles are of higher local commonality comparing to ordinary circles, although the difference is even larger comparing to the global commonality case. This indicates that local commonality could be a better feature to distinguish shared and ordinary circles than global commonality. Aside from local and global commonalities, the features mentioned in previous sections (e.g., reciprocity, density, popularity) can also be used to distinguish shared and ordinary circles. For example, we show the probability density functions of reciprocity for shared and ordinary circles in Figure 7(c). Compared to ordinary circles, shared circles are more likely to have very high or low reciprocity. We also notice that, even for non-shared circles, there is a tendency for circles to have either very high or very low reciprocity, which indicates that the categorization of circles into two types celebrity and community is applicable to circles in general, but that the phenomenon is more pronounced for shared circles. In the following, we categorize all the circles (i.e., the union of all sampled shared and ordinary circles) in our dataset into celebrity and community circles and compute the correlation between each feature (reciprocity, popularity, density, local and global commonality) and the circle sharing decision. Of course, some outlier circles do not fit into either of the two categories (celebrity or community). However, this is actually a good indication that they are less likely to be shared (e.g., see Figure 7(c)). Therefore, it is less sensitive to which category we put them into. The Pearson correlation coefficients between circle features and sharing decisions for both celebrity and community categories are shown in Figure 3. We notice that in both the celebrity and community cases, global commonality, local commonality, popularity and density have positive correlation with circle sharing. As expected, reciprocity is positively correlated with circle sharing for community circles, but negatively correlated with circle sharing for celebrity circles. We also notice that, for all of these features, they are more correlated with sharing behavior for community circles, indicating that recommendation for community circles can be made with better accuracy than celebrity circles. Feature Correlation to sharing (celebrity) (community) GlobalCommonality LocalCommonality Reciprocity Popularity Density Table 3. Correlation of sharing with various features. For both community and celebrity circles. In summary, these results suggest that we can recommend to a user to share a circle if either (1) it is a community circle, and it has high reciprocity, popularity, density, local and global commonality, or (2) it is a celebrity circle, and it has low reciprocity, high popularity, high density, and high local and global commonality. We built such a recommender using an SVM classifier and the proposed features to test the feasibility of such recom- 9

10 mendation. This is a difficult problem since a user might make the sharing decision for various unpredictable reasons (e.g., some users might just want to try out the circle sharing feature and randomly pick some circles to share). To evaluate the precision and recall of the recommendation, for both the celebrity and community circles, we use 2/3 of them as training data to train a classifier using the circles features mentioned above, then we compute the precision and recall for the recommendations on the remaining 1/3 testing data. The results are shown in Table 4. Compared to celebrity circles, sharing of community circles can be predicted more accurately, although recalls and predictions in both cases are not very high. Better predictions might be achieved by considering additional features like time of sharing, the sharer s online activity history, etc., and the details are left as future work. Circle group Precision Recall Community Celebrity Table 4. Circle sharing prediction. Finally, recalling that circles can be shared publicly or to selected smaller audiences, we examine the ACL d recipients of shared circles. For simplicity, we consider just two categories (public to everyone, and selective, meaning that the circles was shared with a smaller group of people). For both celebrity and community cases, most circles are shared privately, although, unsurprisingly, celebrity circles are more likely to be shared publicly than community circles. Circle group Public Selective Community 25% 75% Celebrity 37% 63% Table 5. Targets of shared circles. CONCLUSION In this paper, we provided the first large-scale study of the usage and impact of a contact-group sharing tool, the Google+ circle-sharing feature. We identified two different types of shared circles, communities and celebrities, which are characterized by different structural properties (density, reciprocity, and popularity), and which also represent qualitatively different use cases for the feature. We also observed that the circle-sharing feature has had measurable effects on the growth and structure of the social network graph. Edges among circle members grow 150% faster during the week the circle gets shared. Recipients of shared circles create significantly more new circles and add significantly more people to their existing circles based on the shared circles. Finally, we demonstrate the feasibility of recommending to users which circles they should share with friends. We propose a feature called commonality that captures the potential benefits to share a circle. Using commonality and other circle features, we build a recommender and show that circle sharing events, especially those associated with community circles, can be predicted with reasonable precision and recall. In the future, we plan to study the interaction among different circle-sharing events. It would be interesting to know if one circle-sharing event often triggers others, and if yes, how such events propagate through the social network. We also plan to explore how to combine the power of contact-group sharing tools with the intelligence of friend recommenders based on link prediction. ACKNOWLEDGEMENTS We re grateful to Ed Chi for his help with brainstorming at the inception of this work, and to Owen Prater and Andy Hertzfeld for building the Google+ circle sharing feature and helping us understand the implementation details. Lujun Fang was partially supported by NSF Grant CNS REFERENCES 1. Facebook smart list Facebook statistics Google circle-sharing feature release news Katango Listorious Social networking site coverage P. Adams. Grouped: How small groups of friends are the key to influence on the social web. New Riders Press, R. Z. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Group formation in large social networks: Membership, growth, and evolution. In SIGKDD, K. Bacon and P. Dewan. Towards automatic recommendation of friend lists. In CollaborateCom, B. Everitt and T. Hothorn. A Handbook of Statistical Analyses Using R. Chapman and Hall/CRC, S. Farnham and E. Churchill. Faceted identity, faceted lives: social and technical issues with being yourself online. In CSCW, S. Fortunato. Community detection in graphs. Physics Reports, L. Getoor and C. P. Diehl. Link mining: A survey. SigKDD Explorations Special Issue on Link Mining, H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a soial network or a news media? In WWW, J. Leskovec, K. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW, D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Growth of the flickr social network. In WOSN, M. Newman. Clustering and preferential attachment in growing networks. Physics Review E, F. Ozenc and S. Farnham. Life modes in social media. In CHI, M. Skeels and J. Grudin. When social networks cross boundaries: a case study of workplace use of facebook and linkedin. In Group,

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

The Impact of YouTube Recommendation System on Video Views

The Impact of YouTube Recommendation System on Video Views The Impact of YouTube Recommendation System on Video Views Renjie Zhou, Samamon Khemmarat, Lixin Gao College of Computer Science and Technology Department of Electrical and Computer Engineering Harbin

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of information in economic decisionmaking. Consumers that

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set. SKEWNESS All about Skewness: Aim Definition Types of Skewness Measure of Skewness Example A fundamental task in many statistical analyses is to characterize the location and variability of a data set.

More information

by Maria Heiden, Berenberg Bank

by Maria Heiden, Berenberg Bank Dynamic hedging of equity price risk with an equity protect overlay: reduce losses and exploit opportunities by Maria Heiden, Berenberg Bank As part of the distortions on the international stock markets

More information

3.3 Patterns of infrequent interactions 3. PAIRWISE USER INTERACTIONS. 3.1 Data used. 3.2 Distribution of the number of wall posts

3.3 Patterns of infrequent interactions 3. PAIRWISE USER INTERACTIONS. 3.1 Data used. 3.2 Distribution of the number of wall posts On the Evolution of User Interaction in Facebook Bimal Viswanath Alan Mislove Meeyoung Cha Krishna P. Gummadi Max Planck Institute for Software Systems (MPI-SWS) Rice University Kaiserslautern/Saarbrücken,

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

The ebay Graph: How Do Online Auction Users Interact?

The ebay Graph: How Do Online Auction Users Interact? The ebay Graph: How Do Online Auction Users Interact? Yordanos Beyene, Michalis Faloutsos University of California, Riverside {yordanos, michalis}@cs.ucr.edu Duen Horng (Polo) Chau, Christos Faloutsos

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing

More information

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2 Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

EST.03. An Introduction to Parametric Estimating

EST.03. An Introduction to Parametric Estimating EST.03 An Introduction to Parametric Estimating Mr. Larry R. Dysert, CCC A ACE International describes cost estimating as the predictive process used to quantify, cost, and price the resources required

More information

Strong and Weak Ties

Strong and Weak Ties Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General

More information

An Introduction to. Metrics. used during. Software Development

An Introduction to. Metrics. used during. Software Development An Introduction to Metrics used during Software Development Life Cycle www.softwaretestinggenius.com Page 1 of 10 Define the Metric Objectives You can t control what you can t measure. This is a quote

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

More information

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo An Analysis of Verifications in Microblogging Social Networks - Sina Weibo Junting Chen and James She HKUST-NIE Social Media Lab Dept. of Electronic and Computer Engineering The Hong Kong University of

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

HOW SOCIAL ARE SOCIAL MEDIA PRIVACY CONTROLS?

HOW SOCIAL ARE SOCIAL MEDIA PRIVACY CONTROLS? HOW SOCIAL ARE SOCIAL MEDIA PRIVACY CONTROLS? Gaurav Misra, Jose M. Such School of Computing and Communications Lancaster University, UK g.misra@lancaster.ac.uk, j.such@lancaster.ac.uk Abstract: Social

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

# # % &# # ( # ) + #, # #./0 /1 & 2 % 3 4 2 5 3 6 6 7 & 6 4 & 4 # 6 76 /0 / 6 7 & 6 4 & 4 # // 8 / 5 & /0 /# 6222 # /90 8 /9: ; & 0 0 6 76 /0 /!<!

# # % &# # ( # ) + #, # #./0 /1 & 2 % 3 4 2 5 3 6 6 7 & 6 4 & 4 # 6 76 /0 / 6 7 & 6 4 & 4 # // 8 / 5 & /0 /# 6222 # /90 8 /9: ; & 0 0 6 76 /0 /!<! ! # # % &# # ( # ) + #, # #./0 /1 & 2 % 3 4 2 5 3 6 6 7 & 6 4 & 4 # 6 76 /0 / 6 7 & 6 4 & 4 # // 8 / 5 & /0 /# 6222 # /90 8 /9: ; & 0 0 6 76 /0 /!

More information

ModelingandSimulationofthe OpenSourceSoftware Community

ModelingandSimulationofthe OpenSourceSoftware Community ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,gmadey@nd.edu Vince Freeh Department of ComputerScience

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works.

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. MATH 13150: Freshman Seminar Unit 18 1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. 1.1. Bob and Alice. Suppose that Alice wants to send a message to Bob over the internet

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Principles of Data-Driven Instruction

Principles of Data-Driven Instruction Education in our times must try to find whatever there is in students that might yearn for completion, and to reconstruct the learning that would enable them autonomously to seek that completion. Allan

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals

Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals Omer Duskin Dror G. Feitelson School of Computer Science and Engineering The Hebrew University

More information

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

Nursing Journal Toolkit: Critiquing a Quantitative Research Article A Virtual World Consortium: Using Second Life to Facilitate Nursing Journal Clubs Nursing Journal Toolkit: Critiquing a Quantitative Research Article 1. Guidelines for Critiquing a Quantitative Research

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

More information

Dissecting the Learning Behaviors in Hacker Forums

Dissecting the Learning Behaviors in Hacker Forums Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,

More information

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

USES OF CONSUMER PRICE INDICES

USES OF CONSUMER PRICE INDICES USES OF CONSUMER PRICE INDICES 2 2.1 The consumer price index (CPI) is treated as a key indicator of economic performance in most countries. The purpose of this chapter is to explain why CPIs are compiled

More information

Comparing Methods to Identify Defect Reports in a Change Management Database

Comparing Methods to Identify Defect Reports in a Change Management Database Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

THE SECURITY EXPOSURE

THE SECURITY EXPOSURE Secunia Whitepaper - February 2010 THE SECURITY EXPOSURE OF SOFTWARE PORTFOLIOS An empirical analysis of the patching challenge faced by the average private user In this paper, we examine the software

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

Server Load Prediction

Server Load Prediction Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods

Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods João Emanoel Ambrósio Gomes 1, Ricardo Bastos Cavalcante Prudêncio 1 1 Centro de Informática Universidade Federal

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

Crowd sourced Financial Support: Kiva lender networks

Crowd sourced Financial Support: Kiva lender networks Crowd sourced Financial Support: Kiva lender networks Gaurav Paruthi, Youyang Hou, Carrie Xu Table of Content: Introduction Method Findings Discussion Diversity and Competition Information diffusion Future

More information

April 2016. Online Payday Loan Payments

April 2016. Online Payday Loan Payments April 2016 Online Payday Loan Payments Table of contents Table of contents... 1 1. Introduction... 2 2. Data... 5 3. Re-presentments... 8 3.1 Payment Request Patterns... 10 3.2 Time between Payment Requests...15

More information

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Stanford prison experiment 2. Friendster 3. neuron 4. router 5. tipping 6. small worlds 7. job-hunting

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

Improving SAS Global Forum Papers

Improving SAS Global Forum Papers Paper 3343-2015 Improving SAS Global Forum Papers Vijay Singh, Pankush Kalgotra, Goutam Chakraborty, Oklahoma State University, OK, US ABSTRACT Just as research is built on existing research, the references

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

1. The Fly In The Ointment

1. The Fly In The Ointment Arithmetic Revisited Lesson 5: Decimal Fractions or Place Value Extended Part 5: Dividing Decimal Fractions, Part 2. The Fly In The Ointment The meaning of, say, ƒ 2 doesn't depend on whether we represent

More information

AN ANALYSIS OF A WAR-LIKE CARD GAME. Introduction

AN ANALYSIS OF A WAR-LIKE CARD GAME. Introduction AN ANALYSIS OF A WAR-LIKE CARD GAME BORIS ALEXEEV AND JACOB TSIMERMAN Abstract. In his book Mathematical Mind-Benders, Peter Winkler poses the following open problem, originally due to the first author:

More information

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support

More information

CPA Perfomance Trends on the Google Display Network

CPA Perfomance Trends on the Google Display Network CPA Perfomance Trends on the Google Display Network Google Display Network White Paper CPA Performance Trends on the Google Display Network About the Google Display Network The Google Display Network offers

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Georgia State University Social Media Toolbox. Prepared by: Terry Coniglio, University Relations Assistant Director, Social Media

Georgia State University Social Media Toolbox. Prepared by: Terry Coniglio, University Relations Assistant Director, Social Media Georgia State University Social Media Toolbox Prepared by: Terry Coniglio, University Relations Assistant Director, Social Media Social Media has changed the way we at Georgia State University communicate

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

Practical Research. Paul D. Leedy Jeanne Ellis Ormrod. Planning and Design. Tenth Edition

Practical Research. Paul D. Leedy Jeanne Ellis Ormrod. Planning and Design. Tenth Edition Practical Research Planning and Design Tenth Edition Paul D. Leedy Jeanne Ellis Ormrod 2013, 2010, 2005, 2001, 1997 Pearson Education, Inc. All rights reserved. Chapter 1 The Nature and Tools of Research

More information

Sampling and Sampling Distributions

Sampling and Sampling Distributions Sampling and Sampling Distributions Random Sampling A sample is a group of objects or readings taken from a population for counting or measurement. We shall distinguish between two kinds of populations

More information

Analysis of academy school performance in GCSEs 2014

Analysis of academy school performance in GCSEs 2014 Analysis of academy school performance in GCSEs 2014 Final report Report Analysis of academy school performance in GCSEs 2013 1 Analysis of Academy School Performance in GCSEs 2014 Jack Worth Published

More information

DHL Data Mining Project. Customer Segmentation with Clustering

DHL Data Mining Project. Customer Segmentation with Clustering DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010 DHL Data Mining Project Table of Contents Introduction to DHL and the

More information

Data Analysis, Statistics, and Probability

Data Analysis, Statistics, and Probability Chapter 6 Data Analysis, Statistics, and Probability Content Strand Description Questions in this content strand assessed students skills in collecting, organizing, reading, representing, and interpreting

More information

Reciprocal versus Parasocial Relationships in Online Social Networks

Reciprocal versus Parasocial Relationships in Online Social Networks Noname manuscript No. (will be inserted by the editor) Reciprocal versus Parasocial Relationships in Online Social Networks Neil Zhenqiang Gong Wenchang Xu Received: date / Accepted: date Abstract Many

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

Chapter 2 Quantitative, Qualitative, and Mixed Research

Chapter 2 Quantitative, Qualitative, and Mixed Research 1 Chapter 2 Quantitative, Qualitative, and Mixed Research This chapter is our introduction to the three research methodology paradigms. A paradigm is a perspective based on a set of assumptions, concepts,

More information

Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Oliver Sutton February, 2012 Contents 1 Introduction 1 1.1 Example........................................

More information