Roles in social networks: methodologies and research issues

Transcription

1 Roles in social networks: methodologies and research issues Mathilde Forestier, Anna Stavrianou, Julien Velcin, and Djamel A. Zighed Laboratoire ERIC, Université Lumière Lyon 2, Université de Lyon, 5 avenue Pierre Mendes France, Bron Cedex, France {mathilde.forestier, anna.stavrianou, julien.velcin, abdelkader.zighed}@univ-lyon2.fr Abstract. The expansion of web user roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks through the web. The interaction level among users leads to the appearance of several social roles which can be characterized as positions, behaviors, or virtual identities. These roles may be developed in social networks, and they keep changing and evolving over time. In this article, a survey of the state-of-the-art approaches is presented regarding the identification of roles within the context of a social network. It is shown that social roles exist as a function of each other; they appear and evolve through user interaction. Different approaches are analyzed and additional characteristics that should be taken into account during the role analysis are discussed. Keywords: social network, social role, online discussion 1. Introduction With the advent of Web 2.0, the users have become not only consumers of information but also producers [2]. They interact with each other, they participate in online discussions, they exchange information and opinions, they form social networks. The level of interaction among the users defines social roles which can be characterized as positions, behaviors, or virtual identities. These roles may be developed in social networks formed through exchanges, discussions in forums or Usenet newsgroups, and they keep changing and evolving over time. Defining a social role depends on the analysis context. Researchers who analyze exchanges within companies [48] see the role more as a position (manager, secretary, etc.). At the same time, the role of a person inside a web discussion is more similar to a virtual identity: is the person I am talking to an expert? And if so, what level of expertise does she have [76]? Roles may be pre-defined (popular user, expert, etc.) * Corresponding author. [email protected] or their existence may result through the observation of patterns of interaction. In this paper, a survey of the state-of-the-art is presented regarding the identification of such roles that users may have or obtain within the context of a social network. As it will be shown, social roles exist and develop through user interaction: a role exists as a function of another social role. Identifying roles inside social networks is, nowadays, significant. Knowing, for instance, who the expert is in a technical forum facilitates the extraction of the most appropriate answer to a question. Furthermore, the identification of the people whose role is to influence is important especially in the case of viral marketing [17] (sometimes referred to as "wordof-mouth"), which is based on the diffusion of information through the links connecting people in a network. People who influence the decisions of a community play a significant role in the approval or rejection of preferences and tendencies. Thus, identification of such roles enables the understanding and better analysis of interactions within social communities. Additionally, the comprehension of new social relations within virtual communities is allowed, since be-

2 haviors of users in a certain way reveal a role [72] and users often guide their interaction (decision of whom to talk to) based on these identifications [69]. In this paper the different methodologies that have been proposed for the purpose of role identification inside a social network are discussed. The paper begins by defining the notion of roles inside a social network, in Section 2. In Section 3, existing approaches are presented regarding the identification of non-predefined roles, while in Section 4 approaches concerning predefined roles such as the role of an expert or an influencer are dealt with. Section 5 discusses existing issues and challenges followed by the conclusion in Section 6. The social network of Figure 1 represents an exchange among four actors (John, Mary, Peter and Sarah). The graph reveals that Mary, Peter and Sarah share s, while Mary exchanges s with John who does not discuss through mails with Peter or Sarah. Figure 2 shows a more complex social network extracted from the analysis of a real forum [62]. The actors are people who participate in the discussion, and the links between them represent a reply relationship (i.e. if B replies to A during the discussion, a directed link is created from B to A). Thus, the interaction among users is more comprehensible. It is evident that some individuals are not connected to the network. This is because nobody replies to them and they do not reply to anyone else. This kind of graph representation allows us to see the community as a whole and the interaction between the actors. 2. Social networks and roles A social network is often represented by a graph whose nodes represent the actors of the network (people, organizations) and the links between the nodes show relationships. The graph can be either directed or not according to the relations it represents (e.g. friendship, co-authoring, post-replying). Moreover, the nodes may have attributes that characterize them (e.g. name and sex of an actor) and the edges may be weighted. Figure 1 shows a simple undirected network consisting of four nodes. Fig. 2. A social network extracted from a forum. Fig. 1. An example of a simple social network. Such networks are studied with techniques of Social Network Analysis [58] in order to analyze the characteristics of the various actors, detect patterns and identify existing communities. Based on graph theory principles, several measures are used for analysis purposes exploiting mainly the link structure of the network. Among these measures, there exists the degree centrality which is divided into two measures for the case of directed graphs: the in-degree which points out the number of incoming links for a node, and the outdegree which is the number of outgoing links. A social network can be analyzed as a whole, in the sense that all of its actors are represented and patterns are attempted to be identified that lead to the presence of communities. It may also be seen from the viewpoint of a projection of the network on a particular actor having all the neighbors at a distance which is defined a priori. In the latter case, the network is called egocentric social network [12], since the center of the study is the individual rather than the whole network.

3 In the rest of the article the roles are divided into two categories: the non-explicit roles and the explicit roles. The non-explicit roles represent social roles not defined a priori. On the other hand, explicit roles concern predefined types of actors such as experts or influencers or even certain roles found inside virtual web communities. Although there exists no real consensus on a possibly global definition of a role, we can give some references on well-spread acceptations of this concept. Early research [50], mainly in the field of sociology, define the notion of a role within a social structure. They well differentiate between the notions of role and position, something that has led to the motivation of several works [73,74]. The notion of position is deeply detailed by Borgatti and Everett [8]. In the following, we give the difference they make between a role and a position in a nutshell. Definition 1 For an individual, the position is a well-defined place in a social structure. Position is usually related to some kind of similarity: attitudes, mental health, production of scientific knowledge etc. 1 Two actors occupy the same position if they are connected to the rest of the network in the same way. For instance, parent and child are both eligible positions. Definition 2 In a social structure, a role is a set of expectations that are coupled to positions. For instance, the role parent is associated to some expectations of what parents should do. The role child is another position with other expectations related to what children should do. Positions and roles form together a social system that generates social relations. The roles result in specific behaviors and interactions which can be observed (e.g., giving order, sending s, etc.). According to these definitions, actors with similar roles will share common features and common patterns of relations [57], even if they do not share any direct relationship. As mentioned in [8], a society is a network among individuals, whereas a social structure is the underlying network describing the relation between positions and roles. 1 For a more complete list: see [8]. 3. Non-explicit roles This section presents the roles that emerge by taking into consideration either only the network structure, or just the content of the exchanges (e.g., s, posts), or both the structure and the content. The main idea, here, is to use unsupervised Machine Learning techniques in order to group automatically the data into (non-predefined) role categories. In a Machine Learning perspective, the data dealt with are relational and they violate the classical independence or exchangeability assumptions usually made such as the assumption that the observations are dependent because of the way they are connected [4]. Basically, clustering algorithms using the graph structure [20,32] and/or the textual content of the exchanged documents (e.g., s, posts) can be used [7,64]. This means that no (or just little) background knowledge is used in the definition of the roles. It is assumed that the roles will emerge from the regularities found both in the structure of the human relations and in the features describing people, including also the production of textual contents (messages from forums, s, etc.). For instance, the position of a manager in a company will be identified by the kind of vocabulary she uses in her s depending on the position taken by the receiver, whether this is a secretary or another manager. Within this context, it is evident that a specific behavior (e.g. sending an with frequently-appeared words such as meeting, prepare, arrange ) consists of an observation that leads to the identification of the role manager-secretary. The following section is divided into two main approaches: the first approach is based on the blockmodels and uses mainly the structure of human relations and some predefined features, while the second approach is based on probabilistic bayesian models and uses both structure and (mainly) existing textual content Identifying positions using blockmodels Doreian et al. [20] provide a good overview on blockmodels in social networks. Blockmodeling is an algebraic framework that deals with various issues of social networks such as the identification of communities and their in-between relations, roles, etc. To this extent, role detection, and the relations between them, can be seen as an application of this general framework. Blockmodeling focuses mainly on the network structure, but it can also deal with node attributes and

4 multiple relations [32]. Handling multiple relations, in this case, involves dealing with multigraphs. The position can be either given by the analyst or it may automatically be estimated in an unsupervised way. If it is predefined, based, for instance, on a node attribute (e.g., sex: male or female), the difficulty lies mainly in calculating the ties between these positions using the observed inter-individual relations. This means that the roles are estimated given the positions. As a result, a measure should be used to quantify the equivalence [8,73] between actors and the relation strength between their positions. If the type of position is not predefined, the algorithm must calculate homogeneous clusters and inter-cluster ties in the meantime, that is positions and roles. In both cases, the notion of equivalence between the graph nodes is crucial. This survey focuses on the second case which seems to be more appropriate for role detection. Thus, a crucial point is to define a correct relation of equivalence depending on the task in hand. Different level of equivalences are proposed in the literature [8,73], such as structural, strong, regular, and automorphic equivalence. The structural equivalence (referring here to the distinction made in [8,24]), usually used in social network analysis, leads to placing people into groups sharing similar interests. The notion of similarity used here relies only on sharing the same neighborhood (we may refer to the motto: friends of my friend are also my friends ). It leads to the construction of communities of similar actors, who share common characteristics such as practising hobbies, appreciating similar movies, playing games. Moreover, it is quite related to the task of extracting cliques (sets of highly connected nodes) from graphs [21]. The structural equivalence is too restrictive to capture the abstract notion of roles. On the contrary, the regular and the automorphic equivalences try to capture the sociological notion of a relational role. In this case, two actors are considered to have the same role if they are linked to people with similar roles, allowing, in this way, two equivalent actors to be placed in completely different parts of the network. This leads naturally to form groups associated to specific roles, such as parent, child, manager, secretary, etc. Contrary to the structural equivalence, the underlying clustering principle that keeps groups together is cohesion or proximity instead of similarity [8]. Methodologies A blockmodel is a smaller comprehensible (graph) structure coded by a square binary matrix 0/1. This matrix, also called the image, is asso- Fig. 3. The process of blockmodeling. ciated to a type of relation (friendship, work relationship, etc). Several blockmodels can be built to explain the activity of a complex social network based on relational data. The goal of blockmodeling is to reduce a large, complex relational network into one or several images. Mathematically speaking, the objective is to build an homomorphism corresponding to the chosen equivalence (e.g. the regular equivalence). To give the reader a better insight, Figure 3 shows the process from the original social network composed of 8 people (i) to the representation of the interactions between roles (v). The 0/1 matrix (ii) is rearranged by permuting both rows and columns in such a way that data are more interpretable. Depending on the context and the application task, the permutation can be performed either under supervision (e.g., depending on the attribute values) or in a completely unsupervised way. The rearranged matrix (iii) exhibits four main blocks: three zeroblocks and a block containing some 1 s. Using predefined criteria related to the equivalence notion to be used, it leads to the block model (iv) showing two positions A and B that are related as: A B. This is the case, for instance, with the roles of father and child, illustrated in the role network (v).

5 No matter whether the positions are given or not, the main task is to enumerate the role graphs (the image graphs ) and to test them on raw data. To this extent, the atomic operation is to compare them using an appropriate fitness measure [35]. This measure evaluates the closeness between the ideal image and the tested current image. Another possible method is to use local optimization heuristics and meta-heuristics. The algorithm CONCOR [10] builds a hierarchical tree (a dendrogram) by iteratively split the data into two blocks. Instead of using the structural equivalence, REGE [9] uses the regular equivalence to build blockmodels. In the end, each original node is associated to exactly one role in the image graph. In these early works, some stochastic models have already been proposed [25,38]. However, they mainly focus on structural equivalence and they assume that a partition of people is already known. This partition can be given using the attributes that characterize the network nodes (for instance, the social category). Fienberg and Wasserman [25] developed a probabilistic model for structural equivalence of actors in a network, under which the probabilities of relationships with all other actors are the same for all actors in the same class. This can be viewed as a stochastic version of a block model. It can represent clustering, but only (again) when the cluster memberships are known. Wasserman and Anderson [71] as well as Snijders and Nowicki [60] extended these models to latent classes; the difference is that these latent class models do not assume cluster memberships to be known, but instead estimate them from the data. Holland and Leinhardt [38] propose a probabilistic model p 1 to analyze social networks. This model does not explicitly take into account the blockmodel effect, defining implicitly a kind of simple blockmodel [70]. Their assumptions correspond simply to a stochastic version of the notion of structural equivalence. Wang and Wong [70] extend the model p 1 with the block structure and propose a stochastic blockmodel. This model uses both the information of intranode attributes and inter-nodes relations. The model parameters are classically estimated by the Maximum likelihood principle. The authors apply their model to the two datasets taken from the experiments of Hansell [33] and Sampson [56]. Handcock et al. propose a new latent position cluster model (LPCM) [32]. Contrary to previous approaches, they integrate into a probabilistic model the interindividual distance. In this way, they can take into account the graph transitivity, which shows that actors tend to relate to each other when they share common attributes (e.g. age, gender, geography, race). The authors propose a bayesian approach to estimate the best number of clusters. To validate their model, they use a friendship network among a group of 69 students taken from [34,67]. Wolfe and Jensen [75] extend previous works on stochastic blockmodels [60,70] by allowing multiple roles. They propose a probabilistic model where an interaction (say, an edge of the graph) is generated by one role each time. Airoldi et al. [4] improve the previous latent stochastic blockmodel in such a way that each object can belong more or less to several clusters. This entails that an individual can play several latent roles at the same time, which is quite similar but not the same as in [75]. The authors propose the mixed membership stochastic blockmodel (MMB) in order to adapt the traditional models to social networks. In this proposition, each observation is associated to a membership vector that relates it to the different clusters, similarly to the clustering approaches based on mixture models. It allows capturing various aspects of the documents, such as the underlying topics. They use variational inference to estimate the parameters of the model. Fu et al. [27] take into account the natural evolution of networks when proposing the new dynamic mixed membership stochastic blockmodel (dmmsb). This very important characteristic was never really included into the previous models, at least using blockmodeling. This means that the actors can play various roles through time depending on different varying reasons. The roles themselves can evolve. The model proposed by the authors is based on the previous work on dynamic topic modeling [6] and static model that capture role correlations [4]. More precisely, it augments the MMSB model with a state space model similar to that used in Dynamic Topic Models. This new probabilistic model is able to track across time the evolving roles of the actors Estimating roles using probabilistic models on textual content Another modern approach to role analysis uses unsupervised hierarchical bayesian models, mainly on textual datasets. The works presented in this section can be seen as the convergence between bayesian and social networks. The authors argue that the relational structure is not enough when analyzing textual datasets, such as s, blogs, scientific papers. The

6 idea is, in this case, to use the textual content associated to the graph edges ( , post, message) in addition to the relational structure. It relies on previous works on topic models that extract topics from texts assuming a probabilistic generative model and associate a mixture of topics for each text [7]. In this context, a topic z is defined as a multinomial distribution p(w/z) over a given vocabulary (for instance, a list of words w). In this theoretical framework, several models have been proposed using implicitly or explicitly the notion of role. Following the work of [64] with the Author Topic model (AT), McCallum et al. [48] propose three bayesian hierarchical models to deal with roles with datasets. The Author Recipient Topic model (ART) is a directed graphical model of words in a message generated given their author and a set of recipients. In this model, the role is implicit because the role lies in the set of topics (in other words, the kind of vocabulary) associated to each tuple (author, recipient). The authors in [48] improve their model by two variant models named Role Author Recipient Topic models (RART1 and RART2). In these two models, the role is explicitly modeled in the bayesian network as a latent random variable. A role is therefore a topic mixture characterizing the relation of two persons (that is, the author and the recipient). Recent work introduces the task of Conference Mining [14]. In this work, the authors focus on the specific role of experts or mavens (they use both terms interchangeably). Roughly speaking, additional dimensions are added such as the time and the identity of the source in order to provide a global analysis of topical trends in the scientific literature. The idea behind this is to consider the semantics-based intrinsic structure of the words present between conferences, following a topic-model point of view. Methodologies Figure 4 shows the bayesian generative models that underlie the Author Recipient Topic (ART) model and the (first) variant Role ART (RART1). These graphical models try to capture the generative process that builds the dataset. Whereas the Author- Topic model estimates a mixture θ of topics z for each author a A individually, the ART model associates each pair author-recipient (a, a) to a mixture θ. In other words, ART conditions the per-message topic distribution jointly on both the author and individual recipients. This permits to describe the relation between people using this mixture of topics. An estima- Fig. 4. The models ART (left) and RART1 (right) proposed in [48]. tion procedure similar to a clustering process permits then to discover people s implicit roles based on this relation. The two variants of RART are both an extension of ART with an additional level of latent variables. These additional latent variables g (author s role) and h (recipient s role) represent roles associated to people in the network and estimate explicitly the distributions p(r/a), r {g, h}. A person a can have multiple roles r simultaneously, and there she is associated to a mixture ψ over roles. The generation of words w does not depend on the authors, but rather on their roles. Figure 5 shows the output of the learning process based on a RART-based model. Given a predefined number k of roles (here, 3 roles) and the textual exchanges, such as s, we obtain a matrix k k of topic mixtures. Each element of this matrix θ i,j is a mixture of topics. θ i,j can be seen as the kind of subjects a person playing the role r i uses when she talks with a person playing the role r j. In this model an author can play several roles, depending on the recipient she is writing to. This is the reason why the author a is associated to a mixture ψ over roles. Figure 6 illustrates the multinomial distribution associated to a specified person with 3 roles. Fig. 5. The role network estimated by a RART-based model. Learning this kind of probabilistic models means estimating the (often numerous) model parameters.

7 The evolution of such graph-based models towards probabilistic models is a first step to use more local information. Topic-based bayesian models use actually this crucial textual information, but they lack the global view of blockmodels. The convergence of both models is considered a challenge nowadays. Fig. 6. The distribution of the author a over the 3 roles R = {r 1, r 2, r 3 }. There are two main approaches to estimate these parameters, which are related to the classical inference problem in bayesian networks. The first one is the approximate inference with variational methods, used in [6,7]. The second one is based on Monte-Carlo simulation methods, such as Gibb s sampling [64]. The authors evaluate their models with both the Enron corpus and the personal correspondence of McCallum. The evaluation is based on a qualitative evaluation and on a quantitative measure that estimates the predictive power of the models. For instance, the predictive power of such models can be estimated using the perplexity measure based on the maximum likelihood principle. In the task of Conference Mining, Daud et al. [13] propose an original model for discovering the latent topics between the authors, venues (conferences or journals) and time simultaneously. The authors call their model STMS as in semantics and temporal information-based maven search. Discovering mavens, i.e. people with a given expertise, is just a by-product of this model. According to some query q composed from a limited number of words w defining the area of expertise, the authors a are ranked by their probability values p(a/q). By the Bayes rule and some usual assumptions, p(a/q) is proportional to p(q/a) which is equal to w q p(w/a). The point is that the probability p(w/a) is calculated on the topic basis described by the generative model: p(w/a) = z p(w/z)p(z/a). These quantities p(w/z) (distribution of the words w given the topic z) and p(z/a) (distribution of the topics z given the author a) are usual quantities already present in previous models, such as the LDA [7] and AT-based models [64] Summary Traditional blockmodels are more related to the graph theory. They are maybe more adapted to deal with social networks. They use mainly the relational structure, but ignore the exchanged message content. 4. Explicit roles In some cases, the role is already predefined and its identification inside a social network regards the detection of certain criteria that are satisfied by some users. This section deals with two roles that are given great attention to in the existing literature: the experts and the influencers (or influentials). In addition other explicit roles are discussed that may exist inside online discussion groups Identifying experts Several works deal with the identification of experts inside a social network. These works use various metrics in order to identify expertise and they define the experts as follows: Definition 3 An expert is a person who has knowledge about a topic discussed inside a social network, and, as such, her opinions and ideas can be trusted. Identification of expertise has appeared early in literature independently of the existence of a social network [49,65]. TREC 2006 [61] also proposed an expert finding task, where most participants used Information Retrieval techniques to identify experts. In this article, the focus is on the identification of expertise within social networks and especially within online communities, since the experts are the people to whom the other social network members will go to in order to seek advice or help. The need for experts is often seen in forums dealing with technical or even health issues. Examples include the Microsoft Answers ( or the Technology Network Community of Oracle ( Within such online communities, the postings are questions or answers on a certain subject. Knowing the experts, facilitates the identification of the answers that are more likely to be correct and/or complete. Moreover, differentiating the quality replies amongst hundreds of other postings allows a reader to quickly find out the posting worth being read.

8 Methodologies Forum question-answering is in the center of the study in [1]. The authors study the expertise across various topics, by identifying patterns and behaviors regarding how people question and answer through postings inside a forum. They point out that links in such a social network show more the topics that a user is interested in rather than her expertise. Techniques for the identification of expertise inside forums are proposed in [76]. The post-reply community analyzed is the Java Forum that is represented as a directed graph where the edges show a reply relation between two users (the nodes). The experts are considered to be the actors who can answer appropriately a question and the measures used to rank these experts are the following: 1. The outdegree which designates how many replies a user sends, 2. the indegree which shows to how many different people a user answers, 3. the variation between asking and replying (for the same user) measured by the z-score, 4. a PageRank-based measure which takes into account the person to whom the answer is sent to. For example, a user who replies is generally considered to be more of an expert than the user who receives the reply. Moreover a person who replies to an expert becomes an expert herself. Applying these measures, showed that the structure of the network leads to different expertise ranking results. Expertise is usually related to a topic, even if some network actors may be experts in multiple topics. Identifying experts on a specific topic is dealt with in [5] by constructing topical profiles showing the probability of a person being an expert of a particular skill. This probability is calculated by using similarity vector-based techniques. Each skill is characterized by a vector of keywords. Each person is considered to be relevant (or not relevant) to a skill according to whether she is present (or absent) inside documents related to that specific skill. The presence may be pointed out by having been mentioned inside or having authored a document. In [16] the network actors are the ones who exchange s and the objective of the work is to extract the experts on a certain subject that is discussed through the s. The authors deal with the relative rather than the absolute expertise of two individuals, pointing out which one of the two people has more knowledge than the other on a certain topic. They use PageRank [52] or HITS-based measures [45] and they conclude that PageRank outperforms all other algorithms Identifying influencers Another role whose identification inside a social network has received a lot of attention is that of an influencer: Definition 4 An influencer (or influential) is a person who has the ability to influence the decisions or thoughts of other people inside a social network. The influencers are the right people to market to [17], the "market-movers" [3]. They are the ones that can accelerate the diffusion of innovation whether this involves the launching of new products or novel marketing, social, and political ideas. Knowing the influencers can lead to the reduction of the lag between knowing (be informed) and doing (accept and apply a new idea), and, thus, the spread of new ideas becomes quicker and more efficient [68]. In the case of blogs, which are regarded nowadays as a major way of spreading information, identifying influencers can lead to the extraction of the most representative blog posts of a blog site. Additionally, a reader of a community blog (one where many authors contribute to its content) may give priority to the posts written by the influencers, instead of reading all posts [2,3]. The influence is often measured by the number of people being influenced. The influenced people are usually linked to the ones who influence through a relation such as friendship, collaboration, etc. Influence may diminish or increase over time, and, thus, Agarwal et al. [3] define four types of influencers according to the temporal length of their influence (long-term, avg-term, transient and burgeoning influencers). Methodologies The identification of influencers in social networks is dealt with in several works, taking into consideration various parameters based mainly on the behavior of users inside the network. Influencers may be identified inside community blog sites [3] or even in collaborative systems via implicit influence between users [18,36]. Domingos [17] points out that influence is asymmetric. This means that a person may influence more than being influenced, something that may give her priority in the influence ranking list over others. In the same

9 work, it is noted that the network value of a person depends on the network value of the people she influences. For example, a person who influences people that can, in turn, influence others, is considered to have a high influence power. Thus, the importance of an influencer is also based on the indirect influence over people [18,55]. Several parameters are considered in the existing literature for the identification of influencers. In [3], a community blog influencer is identified by: the degree of recognition by others, measured by the number of inlinks towards this post, the user activity specified by the number of posted comments, the novelty of ideas measured by the number of outlinks and the length of the posts. The degree centrality of users in a network (i.e. how popular they are) as well as their activity history (the number of groups they participate in, average number of updated content per day, etc.) is also used in [44] in order to identify influencers. Furthermore, the propagation of trust and credibility inside communities [47,51,77] is considered to be related with the ability of someone being an influencer. In [68] the adoption of a new idea is said to be influenced by the direct ties of an individual inside the social network and, moreover, her position in the network (structure/hierarchy). The opinion leaders (i.e. influencers) are identified by extracting the most popular actors from the network (e.g. the ones whom people seek advice from) and matching them against the members who are closest to them (e.g. if B goes to take advice from A, and C from B, then, based on transitivity, consider that C gets advice from A). Rohan et al. [66] focus on the identification of influencers inside communities with the purpose of placing advertisements to their profiles. According to this work, the criteria that can characterize someone as being an influencer of a social network community are: The popularity of the network actor within the community, the number of friends, the group membership, the number of user interactions, the quality of content in the user profile, the common interests with the other community members (based on user-profile), whether a dynamic changing of the size of the community is involved and the activity inside a user-profile. The users are ranked according to influence with the application of a PageRank-based algorithm. The influence is measured according to the weight of the cluster (declared friendships in cluster/total friendships in and out of cluster). The change that happens in the weight of the cluster in case a member is removed reveals the influence. The higher the change is, the higher the influence. Influence and communities is also discussed in [59], where the authors focus on influence maximization. They propose the identification of events such as buying a product or adopting a new idea, since such events have the ability to influence neighbor user-nodes. Agarwal et al. [2] differentiate between identification and propagation of influence. They point out that the location of a node in a social network reveals how influence can spread rather than which node has the ability to influence. As a result, it could be evident to identify nodes that could influence but without being able to show if they will definitely influence. On the contrary, influencers are easier to be identified through blogs since there is user-produced content such as comments. Based on the different methodologies proposed, in the existing literature, the researchers interested in the identification of expertise or influence should take into account the properties that appear in Table Identifying social roles in online discussion groups Online discussions include forums, blogs, s, etc. Although they have different formats, they share the characteristic that people interact through them by using a virtual structure, allowing to measure their participation and behavior. Sociologists have studied the different kinds of roles that may appear. Golder and Donath [30] carried out an ethnographic study of the behavior and social roles on the Internet. They have proposed a typology of different social roles in virtual discussions. This typology is described in Table 2. The authors defined a general typology of social roles in online communities regardless of context (i.e. the type of forum discussed: politics, help, question / answer etc.). According to Welser et al. [72], each social role has a certain signature that can be understood as the set

10 Table 1 Social Network Properties that should be considered by the researchers during expert/influencer identification. Property Explanation Measure Activity of user in the network Content of postings Expertise (Applies only to influencers identification) Influence Whether the user is active e.g. within a community blog. Content needs to be of high quality and relevant to the interests of a community. People who have knowledge, influence quicker since they can be trusted / people ask often advice from them. Ability to influence those who can also influence in their turn [17] / degree of influence versus being influenced. Network structure Not all links have the same importance / weight. Novelty of ideas Recognition by others Trust and Credibility Presence of ideas/opinions not discussed before. Is a certain posting referenced by others? (inlinks)[2] Does the information come from someone familiar? Is this person generally trusted inside the community? Do her comments are quality ones? Has she always been trusted and being reliable in the past? Updated content, number of groups in which people participate, number of postings sent, community leaders (e.g. the ones who have created the community), maximization of the number of communities influenced [59]. Keyword extraction [22]. Sometimes the length of the post shows quality [2,3]. This is something gained and recognized by others. Could be found through expertfinding methods, taking into account the network structure and the activity /interactions of each user inside the network. Reach [2] (how other members of the network can be reached). Direct ties inside the network, connectivity, popularity, position inside the network, hierarchy of network [68], degree, closeness, betweenness, clustering coefficient, etc., relation with others (friends, family) Radiality [2] e.g. to how many posts does the specific posting refer to? count citations (of blog posts for example), have these people been quoted and by how many (quality) posts/articles/blogs? Quality of response [2], past experience (relative to the length of being present inside a network - people new to a network are usually trusted less) [2] of behavioral and structural patterns of people s participation. They identify and study two social roles: the answer people and the discussion people that appear in newsgroups. The role of answer people refers to the role of replying to a thread initiated by others while replying to different people as well (answer people do not usually reply several times to the same person). On the contrary, the discussion people belong to a very dense egocentric network and reply to threads initiated either by themselves or others. These social roles are analyzed by several measures: Authorlines represents the volume of contribution for a single actor across all the weeks of a given year. Futhermore, Authorlines differentiates the thread initiated by the given actor from those ones which are not [69], local network neighborhoods represent the ego network for each conversation participant Distribution of Neighbor s Degree is a histogram which shows the distribution of the neighbors degree for each actor, coding behavior from Message Content: questions, answers, answering related behavior and discussions.. Fisher et al. [26] argue that social roles may depend on the context. Indeed, the social behavior can be different if a user posts inside a help forum as opposed to a flame forum. Based on this, the authors focus on individual and collective behavior inside different newsgroups. They construct a second degree egocentric network of the online discussion in order to explore the individual behavior and they calculate the distribution of each neighbor s out-degree distribution (the in-degree is the number of actors who replied to an actor and the out-degree is the number of persons to whom an actor has replied). This measure explains how an author re-

11 Social role celebrity newbie (new user) lurker flamer troll ranter Table 2 Typology of social roles [30]. description The prototypical central figure, prolific posters who spend a great deal of time and energy contributing to their newsgroup s community. Little communicative competence and maybe few similarities with the rest of the group. Reader of the newsgroup s conversations but without participating. Key behavior strategy is the intimidation through very aggressive language, yelling and controversial speech. Pretending to be someone else and makes others believe so. Posting with a high frequency and can be confound with a celebrity, with the difference that a ranter does not participate in conversation threads not initiated by herself. sponds to an actor who is poorly or well-connected to others. In a way it shows whether the specific author talks to people who reply to a lot of other people. In order to identify these roles they use mainly the in- and out-degree statistics and the degree distribution coefficient. Fisher et al. conclude that in a discussion newsgroup, people make a reputation with high participation while in a question/answer group people make their reputation by sending more answers (rather than questions). In [37], Himelboim et al. study social roles in political discussions and define the social role of a discussion catalyst. This social role refers to an individual who influences the information that enters a newsgroup and affects the discussion evolution within it. They evaluate the posting behavior by three measures: the reply share: the proportion of replies in the threads initiated by an author to the total number of replies in the newsgroup, the replier share: the proportion of the newsgroup authors who post messages in an author s thread to the total number of newsgroup participants, the success ratio: the proportion of threads an author initiates which have received replies from at least two other authors. A discussion catalyst has a high reply share, replier share and success ratio. Always in political discussions, Kelly et al. [42] explore three social roles. The fighters, the friendlies and the fringe. The role of fighters represents the great majority of actors who are the ones that respond more often to opponents rather than to allies. The role of friendlies refers to a smaller group of actors who respond to allies more often than to opponents. And, finally, the fringe represents a marginal group that raises interesting questions for qualitative study. In order to identify these three social roles, Kelly et al. analyse political newsgroups and they focus on the in-degree and out-degree egocentric networks with each node containing the actor s political affiliation. Viegas and Smith [69] propose a new interface named "Newsgroup Crowd" in order to automatically visualize which actor is important and which one seems less important inside newsgroups. Their study involves two levels of the concept of social roles: the social roles within the newsgroup and the social roles across the newsgroup. They identify the importance of actors by evaluating the following characteristics: the number of days during which an author has been active for a certain time period, the author s average number of postings per thread in a newsgroup, how recently an author has been active in the newsgroup and her overall posting activity in the Usenet as a whole, the author s number of postings during a certain time-period for a particular newsgroup, the author s total number of postings in the whole set of Usenet newsgroups, the first and the last day that the author was seen in a specific newsgroup, the top five newsgroups where an author has been active, Authorlines. Based on these characteristics, the typology of roles is presented in Table 3 for the authors that participate in newsgroups, as well as across newsgroups (second part of table). The measures used by the authors to extract social roles from online discussions, are summarized in Table Discussion The aforementioned methodologies are summarized in Tables 5 and 6, for non-explicit and explicit roles respectively. Table 5 points out the name of the model

12 Table 4 Properties that should be considered for social role identification from online discussions. Property Explanation Measure Egocentric network Network structure Content of postings Thread analysis Activity of the poster in the discussion It helps to understand the place of the individual in relation to its neighbors. It is a more accurate view of the social network based on the individual. The place and importance of the individual in the social network. The kind of posting (i.e. question post, answer post etc.). Reconsider the actor participation in the context (i.e. the thread). It measures the actor activity in the discussion and in the group. Degree, Distribution of Neighbor s degree, etc. Link structure: in-degree and out-degree. Some authors have manually categorized the post content to categorize actors. Reply share, replier share, success ration, AuthorLines. Actor s average number of posts per thread in a newsgroup, number of posts in a newsgroup, number of posts in Usenet, number of active days, etc. Table 3 Typology of roles for the authors that participate in newsgroups [69]. Social role answer person (or pollinator ) debater bursty contributors newcomers and question askers answer person (or pollinator ) debater spammer-like behavior balanced conversationalist description High number of active days, and a low postings per thread ratio. High number of active days, and a very high postings per thread ratio. Low number of active days, moderate to high postings per thread ratio. Low number of active days, and a low postings per thread ratio. High number of active days while he mostly responds to threads started by other authors with one or few messages sent to each thread High number of active days while he mostly responds to threads started by other authors by sending a large number of messages per thread High number of days active, almost entirely initiate threads which then receive no follow-up messages from this author Initiates about as many threads as he replies to and has the same postings per thread ratio on both initiated and non-initiated threads. proposed by each author and specifies whether the approach is probabilistic, focused on extracting one or several roles and whether it takes into consideration the structure, the content and the time. The last column is used in order to add further information on the particular model. Table 6 concentrates on explicit roles and distinguishes between content analysis and use of user behavior in order to identify a role. The terms probabilistic and stochastic are used interchangeably. Even if their meaning is not equal, their usage is quite equivalent in our context. Based on these approaches, it is evident that identifying roles inside social networks is a research issue, still presenting several challenges. The complexity of the human behavior on the Web and the human reactions and interactions within online discussions that form social networks make the task of extracting and identifying social roles difficult to achieve. However, patterns characterizing each type of role can be identified among individuals with a certain type of behavior. In this section, we discuss the existing approaches and we present issues related to the extraction of roles in social networks. Social roles through interaction. One important notion is that a social role can mainly be identified through the interactions among people [29]. A person has a role in relation to something or someone. Even though, some approaches may use additional, a priori information (e.g. sex, postings, etc.), most of the approaches focus on the social network links which represent the interactions between individuals ( exchanges, postings in forums). In this way, the social role of an actor is analyzed only relatively to the other roles i.e. the expert of a network is automatically more expert than the rest of the network actors. Moreover, interactions include communication codes. For instance, people adapt their vocabulary depending on whom they talk to, and they do not speak with their supervisor as they speak with their colleagues. The same also stands for posts sent during an online conversation. In this context, Donath [19], specifies that through the text of an author, it is possible to see how

13 Table 5 Summary of the non-explicit role identification approaches pointing out the properties on which method is focused on. B stands for generic Blockmodel. sb stands for stochastic Blockmodel. Ref Authors Year Model Structure Content Probabi- -listic [46,35, 74] Several roles Temporal roles Comment various authors B x Among the first works dedicated to blockmodeling. [38] Holland, Leinhardt [71] Wasserman, Anderson 1981 p 1 x x Extends the classical blockmodels to a probabilistic framework sb x x Extends the model p 1 to include latent classes. [70] Wang, Wong 1987 sb x x x Takes into account the attribute s values. [24] Faust 1988 B x Comparison of several methods for traditional blockmodeling (structural and general equivalence). [60] Snijders and Nowicki 1997 sb x x An alternative model for [71]. [75] Wolfe and Jensen 2004 sb x x x Each individual can play several roles. [48] McCallum et al ART, RART1, RART2 x x x x Topic models based on both textual content and structure, that embeds the notion of role. [32] Handcock et al LPCM x x x Takes into account the transitivity within clusters and the homophily on attributes. [4] Airoldi et al MMB x x x This work can be viewed as a first attempt to merge topic models and block models. [13,14] Daud et al STMS x x x x Topic models for conference mining. [27] Fu et al dmmsb x x x x Extends the MMB model to take the temporality of the data into account. she interacts within an online environment. A community shares some linguistic codes difficult to understand for a newcomer. For example, some abbreviations are general to the whole community of writers and others are more specific to a group. These codes allow individuals to recognize each other inside the community and protect themselves from external attacks (troll, flamer). Therefore, it is important to take into account the interaction content for the social role extraction. This interaction content is used in some probabilistic and block models but not, yet, for the social role extraction on web discussions. Text content. At the moment, the identification of experts and influencers inside online communities is mainly link- or activity-based. The semantic presence is not really taken into account. However, the content of the posts sent can reveal a lot of information regarding the role of the respective authors. Text Mining [23,41,63] as well as Opinion Mining techniques [15,28,40] may be applied in order to identify patterns, topic and opinion evolutions. The quality of the posts [22,39] may reveal the experts and the opinion evolution inside the whole network may facilitate the identification of the social network actors who influence. One or several social roles? As aforementioned, some approaches aim at extracting one social role for each actor, while others assume several roles per actor. This depends on the type of role to be extracted. Indeed, probabilistic models and blockmodels attempt to identify the social roles as positions e.g. in a company, people can be supervising while being supervised, thus, an actor can effectively play several roles. On the other hand, during a web conversation (e.g.

14 Table 6 Summary of the explicit role identification approaches distinguishing between bahavior or content-oriented methods. Ref Authors Year SNA Participation Behavior Content Analysis Social Role [16] Dom et al x Expert identification. [5] Balog and De Rijke 2007 x Expert identification. [76] Zhang et al x x Expert ranking. [1] Adamic et al x x Expert identification. [68] Valente et al x Influencer. [17] Domingos 2005 x Influencer. [59] Scripps et al x Influencer. [3] Agarwal et al x Influencer. [66] Rohan et al x x Influencer. [30] Golder and Donath 2004 x x x Celebrity, Newbie, Lurker, Flamer, Troll and Ranter. [69] Viegas and Smith 2004 x x Answer person, Debater, Bursty contributor, Newcomers or Question asker, Spammer-like behavior and balanced conversionalist. [26] Fisher et al x x Questioner and Replier person. [42] Kelly et al x x Friends, Foes and Fringes in political Discussion. [72] Welser et al x x x Answer and Discussion person. [37] Himelboim et al x x Discussion Catalysts. a forum), the social role can be seen as a reputation [19,54]. Therefore, individuals have only one social role defined by their participation in the relevant discussions. For example, in [76] the identification of the expertise level per actor leads in a ranking of experts. The same actor cannot be an expert and a non-expert at the same time, even though the same actor may change roles over time. At each time instance, the role is only one per actor. Temporal dimension of social roles. A role may be dynamic since it may change over time. This has to be taken into account during the role identification process. For instance, an influencer on a domain may stop influencing after a certain time period. Similarly, the identified expert of a technical network may be ranked lower when another expert becomes member of the particular network. As a result, topic and temporal criteria need to be incorporated into proposed approaches. Unlike work in communities [31,53], the evolution of social roles over time is fairly reflected in the articles cited. Although Fu et al. [27] raise the question of the temporal evolution of social roles, it seems evident that people do not have the same social role over time. For example, it could be interesting to analyze the expertise level of a Java forum participant who begins as non-experienced and gradually becomes Java professional or expert. Intra- and inter-community roles. It is worth noting that apart from the roles described in this article, there exist roles that are defined by the structure of the communities where the actor belongs to [11,59]. For example, Scripps et al. [59] emphasize the position of a node within the community structure of the network. As a consequence, the role of a user depends not only on its behavior towards his neighbors, but also on the communities where these neighbors are part of. The user s behavior is measured by the popularity of the node (degree), social network analysis measures (closeness, betweenness), the rank (PageRank, HITS) and a new measure that they propose which gives information about how a node is related to the communities of the network. Based on the actor s position within the community, an actor may be a kind of bridge passing information from one community to another or someone with a lot of links within the community. Similar approaches use a set of social network analysis metrics (e.g. centrality, betweenness) to extract the individuals who play a role. The identification or extraction of social roles may be enhanced with the use of such measures.

15 Evaluating approaches. In the case when the roles are explicit and undoubtedly clear (e.g. the role of a mother, a manager etc.), or they are based on welldefined criteria (e.g. maximum number of inlinks in a network), evaluating a role identification technique is quite straightforward. Nevertheless, when the criteria of a role include subjectivity, such as in the case of ranking experts or influencers within a social network, then organizing experiments and evaluating results is not evident [2]. In this context, evaluation of role identification methodologies presents a research challenge. The evaluation of influence identification can be done by seeing whether people that are supposed to be influenced are indeed influenced. Throughout the literature, there are some evaluation propositions. For instance, web sites that host liked user posts (such as the digg.com) are used [3], assuming that posts of influencers are often liked and, as a result, they may appear in such sites. Simulations which show that new ideas are diffused quicker when they are initially directed towards the opinion leaders are proposed [68] as well as evaluations of systems by propagating a new game through FaceBook [44]. This latter has focused on analysing the total number of users who played the game as well as the number of influencer s invitations accepted. Furthermore, the Independent Cascade Model [43] has been used [59], based on the probability with which an activated node will activate its neighbors. Regarding identification of expertise, human raters are usually asked to participate [76]. These raters read posts of forums written by users in order to rate the expertise of each one of them. 6. Conclusion This survey article presents a state-of-the-art of approaches regarding the identification of roles within a social network. Roles may be predefined, based on certain criteria such as a maximum number of out- or inlinks. Roles may also emerge from the link structure of the network. In any case the extraction of roles is significant for various reasons ranging from marketing/industrial (e.g. the case of viral marketing) to useroriented interests (talk to or avoid certain people inside forums). The status of a role depends on the context. A person has a role in relation to something or someone. Approaches such as the blockmodel and the probabilistic model reflect a more objective reality, in the sense that the role of a manager or the role of a child is a role based on definitions accepted by everyone. On the other hand, approaches that aim at the identification of roles (e.g. experts or influencers) within online discussions, are more subjective, since it is not always straightforward to rank two actors that have similar characteristics. The social role of actors who participate in online discussions depends on their interests, their activity, their recognition by others. Thus, these are characteristics that are not defined the same way by everyone. The majority of the current approaches whose objective is to identify roles inside communities are based on the link analysis of the social network. Future perspectives aiming to enhance such approaches should consider additional dimensions such as the temporal one, the content of the exchanged messages (existing opinions, vocabulary, etc.), the presence and influence of actors by communities they belong or they do not belong to. Text and Opinion Mining techniques should be involved in the analysis of actor-generated content, the evolution of interactions through time should be taken into account and the way in which a community may affect the emergence of different roles should be considered. Moreover, benchmarking methodologies should be studied in order to facilitate the task of evaluation of such methods. References [1] L.A. Adamic, J. Zhang, E. Bakshy, and M.S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In: Proceeding of the International Conference on World Wide Web (WWW 08), pages , Beijing, China, ACM Press. [2] N. Agarwal and H. Liu. Blogosphere: research issues, tools, and applications. SIGKDD Exploration, 10(1):18 31, IEEE Press. [3] N. Agarwal, H. Liu, L. Tang, and P.S. Yu. Identifying the influential bloggers in a community. In: Proceedings of the International Conference on Web search and web data mining (WSDM 08), pages , Stanford, CA, USA, ACM Press. [4] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9: , JMLR. [5] K. Balog and M. De Rijke. Determining expert profiles (with an application to expert finding). In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 07), pages , Hyderabad, India, AAAI. [6] D.M. Blei and J.D. Lafferty. Dynamic topic models. In: Proceedings of the International Conference on Machine learning (ICML 06), pages , Carnegie Mellon, Pennsylvania, USA, ACM Press.

16 [7] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: , JMLR. [8] S.P. Borgatti and M.G. Everett. Notions of position in social network analysis. Journal of Sociological Methodology, 22:1 35, JSTOR. [9] G. Borgatti Martin and P. Stephen. Two algorithms for computing regular equivalence. Journal of Social Networks, 15(4): , Elsevier. [10] R.L. Breiger, S.A. Boorman, and P. Arabie. An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling* 1. Journal of Mathematical Psychology, 12(3): , Elsevier. [11] B. Chou and E. Suzuki. Discovering community-oriented roles of nodes in a social network. In: Proceedings of the Data Warehousing and Knowledge Discovery (DaWaK 10), pages 52 64, Bilbao, Spain, Springer. [12] K.K.S. Chung, L. Hossain, and J. Davis. Exploring sociocentric and egocentric approaches for social network analysis. In: Proceedings of the International Conference on Knowledge Management (KMAP 05), Wellington, New Zealand, [13] A. Daud, J. Li, L. Zhou, and F. Muhammad. A generalized topic modeling approach for maven search. In: Proceedings of the Advances in Data and Web Management (APWeb WAIM 09), pages , Suzhou, China, Springer. [14] A. Daud, J. Li, L. Zhou, and F. Muhammad. Conference mining via generalized topic modeling. In: Proceedings of the Machine Learning and Knowledge Discovery in Databases (ECML PKDD 09), pages , Bled, Slovenia, Springer. [15] X. Ding and B. Liu. The utility of linguistic rules in opinion mining. In: Proceedings of the International Conference on Research and Development in Information Retrieval (SI- GIR 07), pages , Amsterdam, The Netherlands, ACM Press. [16] B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based ranking algorithms for expertise analysis. In: Proceedings of the SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 03), pages 42 48, San Diego, California, USA, ACM Press. [17] P. Domingos. Mining social networks for viral marketing. Journal of Intelligent Systems, 20(1):80 82, IEEE Press. [18] P. Domingos and M. Richardson. Mining the network value of customers. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 01), pages 57 66, San Francisco, CA, USA, ACM Press. [19] J.S. Donath. Identity and deception in the virtual community. Communities in cyberspace, pages 29 59, Psychology Press. [20] P. Doreian, V. Batagelj, and A. Ferligoj. Generalized blockmodeling. Cambridge University Press, [21] J. Edachery, A. Sen, and F. Brandenburg. Graph clustering using distance-k cliques. In: Proceedings of the International Conference on Graph Drawing, pages , Stirìn Castle, Czech Republic, Springer. [22] C. Elkan. Method and system for selecting documents by measuring document quality, US Patent App. 10/004,514, Google Patents. [23] W. Fan, L. Wallace, S. Rich, and Z. Zhang. Tapping the power of text mining. Communications of the ACM, 49(9):76 82, ACM Press. [24] K. Faust. Comparison of methods for positional analysis: structural and general equivalences* 1. Journal of Social Networks, 10(4): , Elsevier. [25] S.E. Fienberg and S.S. Wasserman. Categorical data analysis of single sociometric relations. Journal of Sociological Methodology, 12: , JSTOR. [26] D. Fisher, M. Smith, and H.T. Welser. You are who you talk to: detecting roles in Usenet newsgroups. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS 06), pages 59b 59b, Island of Hawaii, USA, IEEE Press. [27] W. Fu, L. Song, and E.P. Xing. Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the International Conference on Machine Learning (ICML 09), pages , Montreal, Canada, ACM Press. [28] A. Ghose, P.G. Ipeirotis, and A. Sundararajan. Opinion mining using econometrics: a case study on reputation systems. In: Proceedings of the Association for Computational Linguistics (ACL 07), pages , Prague, Czech Republic, ACL. [29] E. Goffman. The presentation of self in everyday life, Doubleday, [30] S.A. Golder and J. Donath. Social roles in electronic communities. In: Proceedings of the International Conference of Internet Research (IR 04), pages 13 22, Brighton, England, Citeeser. [31] D. Greene, D. Doyle, and P. Cunningham. Tracking the evolution of communities in dynamic social networks. In: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM 10), pages , Odense, Denmark, IEEE Press. [32] M.S. Handcock, A.E. Raftery, and J.M. Tantrum. Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(2): , Wiley Online Library. [33] S. Hansell. Cooperative groups, weakties, and the integration of peer friendships. Journal of Social Psychology Quarterly, 47(4): , JSTOR. [34] K. M. Harris, F. Florey, J. Tabor, P. S. Bearman, J. Jones, and R. J. Udry. The national longitudinal study of adolescent health: research design. Technical report, Carolina population center, USA, [35] G. H. Heil and H. C. Whit. An algorithm for constructing homomorphisms of multiple graphs, Department of Sociology, Harvard University, Unpublished paper. [36] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl. Evaluating collaborative filtering recommender systems. Journal of Transactions on Information Systems, 22(1):5 53, ACM Press. [37] I. Himelboim, E. Gleave, and M. Smith. Discussion catalysts in online political discussions: content importers and conversation starters. Journal of Computer-Mediated Communication, 14(4): , Wiley Online Library. [38] P.W. Holland and S. Leinhardt. An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33 50, JSTOR. [39] M. Hu, E-P. Lim, A. Sun, H.W. Lauw, and B-Q Vuong. Measuring article quality in wikipedia: Models and evaluation. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM 07), pages , Lisboa, Portugal, ACM Press.

17 [40] M. Hu and B. Liu. Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 04), pages , Seattle, WA, USA, ACM Press. [41] A. Kao and S. Poteet. Text mining and natural language processing - introduction for the special issue. SIGKDD Explorations, 7(1):1 2, ACM Press. [42] J.W. Kelly, D. Fisher, and M. Smith. Friends, foes, and fringe: norms and structure in political discussion network. In: Proceedings of the International Conference on Digital Government Research (DG.O 07), pages 21 24, San Diego, California, USA, ACM Press. [43] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 03), pages , Washington, DC, USA, ACM Press. [44] E.S. Kim and S.S. Han. An analytical way to find influencers on social networks and validate their effects in disseminating social games. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM 09), pages 41 46, Athens, Greece, IEEE Press. [45] J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5): , ACM Press. [46] F. Lorrain and H.C. White. Structural equivalence of individuals in social networks. Journal of Mathematical Sociology, 1(1):49 80, Routledge. [47] P. Massa and P. Avesani. Trust metrics on controversial users: balancing between tyranny of the majority and echo chambers. Journal on Semantic Web and Information Systems, 3(1):39 64, Citeseer. [48] A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on ENRON and academic . Journal of Artificial Intelligence Research, 30(1): , AI Access Foundation. [49] D.W. McDonald and M.S. Ackerman. Expertise recommender: a flexible recommendation system and architecture. In: Proceedings of the ACM International Conference on Computer Supported Cooperative Work (CSCW 00), pages , Philadelphia, Pennsylvania, USA, ACM Press. [50] S.F. Nadel and M. Fortes. The theory of social structure, Free Press, [51] J. O Donovan. Capturing trust in social web applications. Computing with Social Trust, 1: , Springer. [52] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report, Stanford InfoLab, USA, [53] G. Palla, A.L. Barabási, and T. Vicsek. Quantifying social group evolution. Nature, 446(7136): , Nature Publishing Group. [54] A. Revillard. Les interactions sur l Internet. Terrains et travaux, 1: , ENS Cachan. [55] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 02), pages 61 70, New York, NY, USA, ACM Press. [56] S.F. Sampson. Crisis in a cloister, unpublished Ph.D. Dissertation, Dept. of Sociology, Cornell University, USA, [57] J.E. Schwartz and M. Sprinzen. Structures of connectivity. Journal of Social Networks, 6(2): , Elsevier. [58] J. Scott. Social network analysis, Sage Publications, [59] J. Scripps, P.N. Tan, and A.H. Esfahanian. Node roles and community structure in networks. In: Proceedings of the Workshop on Web Mining and Social Network Analysis (WebKDD/SNAKDD 07), pages 26 35, San Jose, California, USA, ACM Press. [60] T.A.B. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75 10, Springer. [61] I. Soboroff, A. P. de Vries, and N. Craswell. Overview of the TREC 2006 enterprise track. In: Proceedings of the Text Retrieval Conference (TREC 06), Gaithersburg, MD, USA, Citeseer. [62] A. Stavrianou. Modeling and mining of web discussions, PhD Dissertation, University of Lyon, France, [63] A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overview and semantic issues of text mining. SIGMOD Record, 36(3):23 34, ACM Press. [64] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 04), pages , Seattle, WA, USA, ACM Press. [65] L. Streeter and K. Lochbaum. Who knows: a system based on automatic representation of semantic structure. In: Conference of Recherche d Information Assistée par Ordinateur (RIAO 88), pages , Cambridge, MA, USA, CID. [66] S.G. Sheffer T. Rohan, T.J. Tunguz-Zawislak and J. Harmsen. Network node ad targeting, United States Patent Application, [67] R.J. Udry. The national longitudinal study of adolescent health: (add health) waves i and ii ; wave iii Technical report, University of North Carolina, USA, [68] T.W. Valente and R.L. Davis. Accelerating the diffusion of innovations using opinion leaders. The Annals of the American Academy of Political and Social Science, 566(1):55 67, Sage Publications. [69] F.B. Viégas and M. Smith. Newsgroup crowds and authorlines: Visualizing the activity of individuals in conversational cyberspaces. In: Proceedings of the Hawaii International Conference on System Science (HiCSS 04), island of Hawaii, USA, IEEE Press. [70] Y.J. Wang and G.Y. Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):8 19, JSTOR. [71] S. Wasserman and C. Anderson. Stochastic a posteriori blockmodels: construction and assessment. Journal of Social Networks, 9(1):1 36, Elsevier. [72] H.T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. Journal of Social Structure, 8(2):1 31, [73] D.R. White and K.P. Reitz. Graph and semigroup homomorphisms on networks of relations. Journal of Social Networks, 5(2): , Elsevier. [74] H.C. White, S.A. Boorman, and R.L. Breiger. Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4): , JSTOR. [75] A. Wolfe and D. Jensen. Playing multiple roles: discovering

18 overlapping roles in social networks. In: Proceedings of the Workshop on Statistical Relational Learning and its Connections to Other Fields (ICML-SRL 04), Banff, Alberta, Canada, ACM Press. [76] J. Zhang, M.S. Ackerman, and L. Adamic. Expertise networks in online communities: Structure and algorithms. In: Proceedings of the International Conference on World Wide Web (WWW 07), pp , Banff, Alberta, Canada, ACM Press. [77] C.N. Ziegler and J. Golbeck. Investigating interactions of trust and interest similarity. Decision Support Systems, 43(2): , Elsevier.