Mining Social Media with Social Theories: A Survey
|
|
|
- Gerald Wheeler
- 10 years ago
- Views:
Transcription
1 Mining Media with Theories: A Srvey Jiliang Tang Compter Science & Eng Arizona State University Tempe, AZ, USA [email protected] Yi Chang Yahoo!Labs Yahoo!Inc Snnyvale,CA, USA [email protected] Han Li Compter Science & Eng Arizona State University Tempe, AZ, USA [email protected] ABSTRACT The increasing poplarity of social media encorages more and more sers to participate in varios online activities and prodces data in an nprecedented rate. media data is big, linked, noisy, highly nstrctred and incomplete, and differs from data in traditional data mining, which cltivates a new research field social media mining. theories from social sciences are helpfl to explain social phenomena. The scale and properties of social media data are very different from these of data social sciences se to develop social theories. As a new type of social data, social media data has a fndamental qestion can we apply social theories to social media data? Recent advances in compter science provide necessary comptational tools and techniqes for s to verify social theories on largescale social media data. theories have been applied to mining social media. In this article, we review some key social theories in mining social media, their verification approaches, interesting findings, and stateoftheart algorithms. We also discss some ftre directions in this active area of mining social media with social theories. 1. INTRODUCTION media greatly enables people to participate in online activities and shatters the barrier for online sers to share and consme information in any place at any time. media sers can be both passive content consmers and active content prodcers, and generate data at an nprecedented rate. The natre of social media determines that its data significantly differs from the data in traditional data mining. relations are pervasively available in social media data, and play important roles in social media sch as mitigating information overload problem [38; 51] and promoting the information propagation process [4; 67]. media data is big, noisy, incomplete, highly nstrctred and linked with social relations. These niqe properties of social media data sggest that naively applying existing techniqes may fail or lead to inappropriate nderstandings abot the data. For example, social media data is linked via social relations and contradicts with the nderlying independent and identically distribted (IID) assmption of the vast majority of existing techniqes [23; 57]. This new type of data calls for novel data mining techniqes for a better nderstanding from the comptational perspective. The stdy and development of these new techniqes are nder Networks Correlation Userbased Relation based Contentbased Commnity Detection User Classification Spammer Detection Balance Theory Link Prediction User Generated Content Tie Prediction Tie Strength Prediction Recommendation Stats Theory Featre Selection Sentiment Analysis Media Data Theories Media Mining Tasks Figre 1: Theories in Media Mining. the prview of social media mining, which is the process of representing, analyzing, and extracting actionable patterns from social media data [70]. There are many social theories developed from social sciences to explain varios types of social phenomena. For example, the homophily theory [40] sggests how individals connect to each other, while balance theory sggests that sers in a social network tend to form into a balanced network strctre [17]. The scale of the data social scientists employ to develop these social theories is very different from that of social media data. It is easy for social media data to inclde the actions and interactions of hndreds of millions of individals in real time as well as over time. Therefore there is a fndamental qestion for this new type of social data can we apply some social theories to social media data?. If we can apply social theories to social media data, social theories can help s nderstand social media data from a social perspective, and combining social theories with comptational methods manifests a novel and effective perspective to mine social media data as shown in Figre 1. theories help bridge the gap from what we have (socialmediadata)towhatwewanttonderstandsocialmedia data (social media mining). Integrating social theories with comptational models becomes an interesting direction in mining social media data and encorages a large body of literatre in this line. The goal of this article is to provide a review of some key social
2 theories in mining social media data. The contribtions and organization of this article are smmarized as below: The social property of social media data determines that it differs from data in traditional data mining and social sciences. In Section 2, we provide an overview of the niqe properties of social media data; An increasing nmber of social theories is verified in social media data. In Section 3, we focs on three key and widely sed social theories with basic concepts, verification approaches and key findings; The fast growing interests and intensifying need to harness social media data make social media mining grow rapidly. Integrating social theories with comptational methods becomes a principled way to mine social media data. In Section 4, we review the stateoftheart algorithms that exploit social theories in mining social media, and smmary featre engineering, constraint generating and objective defining as three ways to explain social theories for comptational models. theories in mining social media data is still an active area of exploration and there cold be more existing social theories to be employed or new social theories to be discovered from this new type of social media data. In Section 5, we discss some open isses and possible research directions. 2. SOCIAL MEDIA DATA IS SOCIAL relations are pervasively available and the social property of social media data determines that social media data is sbstantially different from data in traditional data mining and social sciences. In this section, we discss some niqe properties of social media data. Before details, we first introdce some notations sed in this article. Let U = { 1, 2,..., n} and P = {p 1,p 2,...,p m} be the set of n sers and m items (or pieces of ser generated content). We se S R n n, R R n m and C R m K to denote serser relation, sercontent interaction and contentfeatre matrices where we extract a set of K featres F to represent the content set P. Big: In social media, we have little data for each specific individal. However, the social property of social media data links individals data together, which provides a new type of big data. For example, more than 300 million tweets are sent to Twitter per day; more than 3,000 photos are ploaded to Flickr per minte, and more than 153 million blogs are posted per year. Linked: The availability of social relations determines that social media data is inherently linked [52]. An illstration example is shown in Figre 2 where ser generated content (or p 1 to p 8) are linked via social relations among sers ( 1 to 4). Linked social media data is patently not independent and identically distribted, which contradicts one of the most endring and deeply bried assmptions of traditional data mining and machine learning methods [23; 57]. Noisy: A sccessfl data mining exercise entails extensive data preprocessing and noisy removal as garbage in and garbage ot. However, social media data can contain a large portion of noisy data. Users in social media can be both passive content consmers and active content prodcers, casing the qality of ser generated content to vary Figre 2: Linked Media Data. drastically [1]. The noisy isses of social media data are not stop here. The social networks in social media are also noisy. First some social media sers work as spammers to spread malicios or nwanted messages [47]. Second, the low cost of link formation leads to acqaintances and best friends mixed together [65]. Unstrctred: User generated content in social media is often highly nstrctred. Nowadays more and more sers se their mobile devices to pblish content sch as pdating statses in Facebook, sending tweets in Twitter and commenting on posts, which reslts in (1) short texts and (2) typos and spacing errors occrring very freqently [25]. Freefrom langages are widely adopted by social media sers in the online commnication sch as ASCII art (e.g., :) and :( ) and abbreviations (e.g., h r?) [46]. The short and highly nstrctred social media data challenges the vast majority of existing techniqes. Incomplete: Users attribtes are predictable with their personal data [26]. To address sch privacy concerns, social media services often allow their sers to se their profile settings to mark their personal data sch as demographic profiles, stats pdates, lists of friends, videos, photos, and interactions on posts, invisible to others. For example, a very small portion of Facebook sers (< 1%) make their personal data pblic available [41]. The available social media data cold be incomplete and extremely sparse. For example, for social recommendation, more than 99% of entities in the sercontent interaction matrix R are missed [51]. 3. SOCIAL THEORIES theories from social sciences are sefl to explain varios types of social phenomena. In social media, it is increasingly possible for s to observe social data from hndreds of millions of individals. Given its largescale size and social property, a natral qestion is can we apply social theories to social media data?. More and more social theories have been proven to be applicable to social media data. In this section, we concentrate on three important social theories with basic concepts, ways to verify them and key findings. 3.1 Correlation Theory correlation theory is one of the most important social theories and it sggests that there exist correlations between behaviors or attribtes of adjacent sers in a social network. Homophily, inflence and confonding are three major social process to explain these correlations as shown in Figre 3. Homophily is to explain or tendency to connect to others that share certain similarity with s. For example, birds of a feather flock together.
3 Individal Characteristics Relations Environment Relations Individal Characteristics Individal Characteristics A: Homophily B: Inflence C: Confonding Relations Figre 3: Major Forces of Correlation. Inflence sggests that people tend to follow the behaviors of their friends and adjacent sers are likely to exhibit similar behaviors. For example, if most of one s friends switch to a mobile phone company, he cold be inflenced by them and switch, too Confonding is a correlation between sers that can also be forged de to external inflences from environment. For example, two individals living in the same city are more likely to become friends than two random individals. To help s verify the applicability of social correlation theory to social media data, essentially we need to answer the following qestion are sers with social relations more similar than these withot? To answer this qestion, for each social relation from i to j, we calclate two similarities s ij and r ik where s ij is the similarity between i and j, while r ik is the similarity between i and a randomly chosen ser k who does not connect to i. Let S be the set of s ijs, which denotes the set of similarities of pairs of connected sers. Let R be the set of r ik s, which represents the set of similarities of pairs of randomly chosen sers. We perform a ttest on S and R. The nll hypothesis is that similarities with social relations are no larger than these withot, i.e., H 0 : S R; the alternative hypothesis is that the similarities with social relations are larger than these withot, i.e., H 1 : S > R. If there is strong evidence to reject the nll hypothesis, we verify that social correlation theory is applicable to social media data. Via above verification process, social correlation theory has been proven to be applicable to varios social media sites. Twitter sers with following relations are likely to share similar topics or opinions [63; 20]. Users in Epinions with trst relations are likely to rate same items with similar scores [49]. In [52], we shows that sers in Digg and Blog Category with social relations are likely to joint grops of similar interests. In locationbased social networks sch as Forsqare, sers with social relations are likely to do checkins in the same locations [69; 14]. 3.2 Balance Theory In general, balance theory implies the intition that the friend of my friend is my friend and the enemy of my enemy is my friend [17]. Basically, it considers the balance of signs on a triad involving three sers in a social network with positive and negative links [28; 27]. We se s ij to denote the sign of the relation between i and j where s ij = 1 (or s ij = 1) if we observe a positive relation (or a negative relation) between i and j. Balance theory sggests that + A: (+,+,+) B: (+,+,) 1 1 C: (+,,) D: (,,) Figre 4: An Illstration of Balance Theory. a triad i, j, k is balanced if s ij = 1 and s jk = 1, then s ik = 1 ; or s ij = 1 and s jk = 1, then s ik = 1. For a triad i, j, k, there are for possible sign combinations A(+,+,+), B(+,+,) C(+,,) and D(,,) as shown in Figre 4, while only A(+,+,+) and C(+,,) are balanced. The way to verify balance theory is straightforward. We examine all these triads i, j, k and then to check the ratio of A(+,+,+) and C(+,,) among all for possible sign combinations. A high ratio sggests that balance theory is applicable to social media data We check the distribtions of for possible sign combinations on the three widely sed social media datasets (i.e., Epinions, Slashdot and Wikipedia) with signed networks in [27]. The ratios of A(+,+,+) and C(+,,) among all for possible sign combinations are 0.941, 0.912, and in Epinions, Slashdot and Wikipedia, respectively. More than 90% of triads are balanced. Similar observations on other social media datasets are reported by [68; 56]. Note that balance theory is developed for ndirected social networks and we sally ignore their directions when applying balance theory to directed social networks [27]. 3.3 Stats Theory Different from balance theory, stats theory is developed for directed social networks [28]. stats refers to the position or rank of a ser in a social commnity, and represents the degree of honor or prestige attached to the position of each individal. In stats theory, a positive link from i to j indicates that i has a higher stats than j; while a negative link from i to j indicates that i has a lower stats than j. For a triad, stats theory sggests that if we take each negative relation, reverse its direction, and flip its sign to positive, then the reslting triangle (with all positive edge signs) shold be acyclic. In [28], contextalized links are introdced to verify stats theory. A contextalized link is defined to be a triple i, j, k with the property that a link forms from i to j after each of i and j already has a link either to or from k. The link between k and i can go in either direction and have either sign yielding for possibilities, and similarly for the link between k and j; hence overall there are 4 4 = 16 different types of contextalized links. Figre 5 demonstrates 4 of 16 types of contextalized links where (A)
4 + 1 + (A) 1 + (C) (B) 1 Figre 5: An Illstration of For Ot of Sixteen Types of Contextalized Links for Stats Theory. Note that + ( or ) denotes the target node has a higher (or lower) stats than the sorce node. and (D) satisfy the stats theory, while (B) and (C) do not satisfy the stats theory. For each of these types of contextalized links, we can cont freqencies of positive verss negative links for the links from i to j and then calclate the ratio of contextalized links satisfying stats theory. In [53], it is reported that 99% of triads in the Enron social network and the advisoradvisee social network satisfy stats theory. Similar patterns are observed on Epinions and Wikipedia datasets in [28]. 3.4 Discssion The scale and properties of social media data sbstantially differ from these of data sed by social sciences to develop social theories. Since social media data is a new type of social data, it is possible to apply some social theories to explain phenomena in social media data. The verification of social theories in social media data not only paves a way for s to nderstand social media data from a social perspective bt also sggests that it is highly possible to facilitate social media mining tasks by integrating social theories with comptational methods. 4. SOCIAL THEORIES IN SOCIAL MEDIA MINING TASKS media mining is an emerging discipline nder the mbrella of data mining and grows rapidly in recent years [70]. The verification of some social theories in social media data sggests that we shold pt social in social media mining and encorages a large body of literatre to model and exploit social theories to advance social media mining tasks. In general, there are three types of objects in social media data sers, social relations and ser generated content, which allows s to roghly classify social media mining tasks to three grops based on the mining objects serbased tasks, relationbased tasks and contentbased tasks. Next we elaborate each grop with representative tasks with their definitions, challenges and the startoftheart algorithms to apply social theories to these specific tasks. (D) 4.1 Theories in UserRelated Tasks For individals, a better nderstanding of their social networks can help them share and collect reliable information more effective and efficient. For social media service providers, a better nderstanding of their cstomers can help them provide better services. Userrelated tasks provide necessary and effective means to nderstand social media sers. In this sbsection, we review social theories in some key serrelated tasks Commnity Detection Commnities in social media can be explicit sch as Yahoo! Grops. However, in many social media sites, commnities are implicit and their members are obscre to social media sers. Commnity detection is proposed to find these implicit commnities in social media by identifying grops of sers that are more densely connected to each other than to the rest of the network [55]. Detecting implicit commnities can benefits many social media mining tasks sch as social targeting and personalization. The major difference between clstering in data mining and commnity detection is that in commnity detection, individals are connected to others in social networks; while in clstering, data points are not embedded in a network and they are assme to be independent and identically distribted. Formally, for a social network G(U,S), commnity detection is to find a set of commnities C where sers are more densely connected within a commnity than to the rest of sers. Homophily sggests that similar sers are likely to be linked, and inflence indicates that linked sers will inflence each other and become more similar. The sggestions from social correlation theory in creating new ties based on the similarity gives rise to macro patterns of associations, also known as commnities [7]. Two sers in the same commnity have higher similarity [44]. The modlarity maximization method is to maximize the sm of the actal nmber of social relations between two sers mins expected nmber of social relations between them since two sers in the same commnity shold have a higher probability to establish a relation than two randomly chosen sers [43]. Wang et al. [60] find that sers within the commnity are likely to share similar tags in social tagging systems and they take advantage of the bipartite network between sers and tags in social tagging systems to discover these overlapping commnities. In [66], a densitybased framework is proposed with the intition that sers in the same commnity shold interact more freqently with each other. Recently applying balance theory to detect commnities from signed networks has attracted increasing attention. In [11], a generalized balance theory is proposed where a network is kbalanced iff sers can be partitioned in to ksbsets sch that positive links lie within the sets and the negative links between them. Balance theory sggests that the assignment of sers related by negative links shold be done the opposite way of positive links, with negative links sparse within and more dense between commnities therefore the potts model is extended to incorporate both positive and negative links to detect commnities in signed network [58]. In [2], a twoobjective approach is proposed for commnity detection in signed networks based on balance theory. One is that the partitioning shold have dense positive intraconnections and sparse negative interconnections, and the other is that it shold have as few as possible negative intraconnections and positive interconnections User Classification
5 De to privacy concerns, social media sers tend to hide their profiles. For social media service providers, sers profile information is sefl for them to cstomize their services to the sers in many ways sch as friend and content recommendations and personalized search. More they know abot sers and their preferences, better they can serve them. Given a social network and some ser information (attribtes, preferences or behaviors), ser classification is designed to infer the information of other sers in the same network [15]. In the ser classification problem, sers in U are partially labeled as U = [U L,U U ] where U L and U U are the sets of labeled and nlabeled sers, respectively. Formally the task of ser classification is to label sers from a finite set of categorical vales in U U with the social network G(U,S) and U L. correlation theory sggests that the labels of linked sers shold be correlated, which is the major reason why researchers believe that the labels of U L can be predicted with the network strctre and the partially labeled sers [15]. correlation theory is the nderlying assmption of most of existing ser classification methods, which design algorithms for collective classification. A typical ser classification algorithm incldes parts of the three components[37]: A local classifier it is sed for initial label assignment; A relational classifier it learns a classifier from the labelsofitsneighborstothelabelofonesersggested by social correlation theory; and Collective classification it applies relational classifier to each node iteratively ntil the inconsistency between neighboring labels is minimized. In [36], a weightedvote relational neighborhood classifier wvrn is introdced for ser classification. wvrn is like a lazy learner and estimates the labels of sers as the weighted mean of their neighbors. In [34], the proposed framework first creates relational featres of one ser by aggregating the label information of its neighbors and then a relational classifier can be constrcted based on labeled data. Neville and Jensen in [42] propose to se clstering algorithms to find ot the clster memberships of each ser first, and then fix the latent grop variables for later inference. Xiang et al. [64] propose a novel latent relational model based on coplas. It can make predictions in a discrete label space while ensring identical marginals and at the same time incorporating some desirable properties of modeling relational dependencies in a continos space. A commnitybased framework is proposed in [54]. It first extracts overlapping commnities based on social network strctre, then ses commnities as featres to represent sers and finally a traditional classifier sch as SVM is trained to assign labels for nlabeled sers in the same network Spammer Detection media has become an important and efficient way to disseminate information. Given its poplarity and biqity, social spammers create many fake acconts and send ot nsolicited commercial content [62]. spammers have become rampant and the volme of spam has increased dramatically. For example, 83% of the sers of social networks have received at least one nwanted friend reqest or message [47]. This not only cases misse of commnication bandwidth, storage space and comptational power, bt also wastes sers time and violates their privacy rights. Therefore developing effective social spammer detection techniqes is critically important in improving ser experience and positively affecting the overall vale of social media services[47]. Given a social network G(U, S), social spammer detection is to find a set of spammers U S from U with U S U. Based on social correlation theory, there are two observations for normal sers and spammers [73]. First normal sers perform similarly with their neighbors. Second, spammers perform differently from their neighbors since most of their neighbors are normal sers. Therefore a social reglarization term is proposed nder the matrix factorization framework to model these observations where two connected normal sers shold be close in the latent space since they share similar interests and may perform similar social activities, while spammers shold be far away from their neighbors in the latent space. In Twitter, sers have directed following relations and spammers can easily follow a large nmber of normal sers within a short time. In [19], we divide serser following relations in Twitter into for types [spammer, spammer], [normal, normal], [normal, spammer], and [spammer, normal]. Since the forth relation can be intentionally faked by spammers, we only consider the first three types of relations. Specifically we introdce a graph reglarization term to model social correlation theory in the directed social relations, which is integrated into the standard Lasso formlation to train a linear classification for social spammer detection. Spammers and normal sers have very different social behaviors. Normal sers are likely to form a grop with other normal sers, while spammers are likely to from spammer grops [29]. In [6], the athors incorporate commnitybased featres of sers with basic topological featres to improve spammer classifiers. It first finds overlapping commnity strctre of sers and then extracts featres based on these commnities sch as the featres which express the role of a ser in the commnity strctre like a bondary node or a core node and the nmber of commnities it belongs to. 4.2 Theories in RelationRelated Tasks A social network is sally represented by a binary adjacent matrix. First the matrix is extremely sparse since there are many pairs of sers with missing relations. Second, social networks in social media are more complicated. For example, strengths of relations might be heterogeneos sch as acqaintances and best friends, while a social network may a composite of varios types of relations sch as family, classmates and colleages. RelationRelated tasks focs on mining relations among sers and aim to reveal a finegrained and comprehensive view of social relations. Signed networks arise in social network with varios ways when sers can implicitly or explicitly tag their relationship with other sers as positive or negative. In this section, we review social theories in some key relationrelated tasks on signed and nsigned networks Link Prediction It is critical for social media sites to provide services to encorage more ser interactions with better experience sch as expanding one s social network. One effective way is to atomatically recommend connections since it is hard for sers to figre ot who is available on social media sites.
6 Most social media sites provide friend recommendation services to their cstomers sch as Facebook, Twitter and LinkedIn. The essential problem of friend recommendation is known as link prediction [30]. When there is no relation between i and j, S ij = 0. The task of link prediction is to predict which pairs of sers i and j withot relations S ij = 0 are likely to get connected given a social network G(U, S). Unsigned Networks : Homophily in social correlation theory sggests that similar sers are likely to establish social relations. In [30], varios similarity measrements sch as common neighbors based on the network strctre are reviewed for link prediction. One challenging problem in link prediction is the sparsity problem some sers may have very few or even no links. In [49], a lowrank matrix factorization framework with homophily effect htrst is proposed to predict trst relations. Homophily coefficients are defined to measre the strength of homophily among sers. The stronger homophily between two sers is, the smaller distance between them in the latent space is. Homophily reglarization is then defined to model homophily effect by controlling sers distances in the latent space with the help of homophily coefficients. Throgh homophily reglarization, trst relations can be sggested to sers with few or even no relations and mitigate the sparsity problem in link prediction. The confonding effect in social correlation theory sggests that people who share high degree of overlap in their trajectories are expected to have a better likelihood of forming new links. In [59], the effect of confonding is investigated for link prediction. Specifically, it leverages mobility information to extract featres which can captre some degree of closeness in physical world between two individals. Stats theory sggests new links are more likely to be attached from sers with low statses to these with high states and the preferential attachment models are widely sed to predict link prediction based on stats measres sch as the degree of nodes and PageRank [5]. Signed Networks : In [27], localtopologybased featres (or 16 triad types) based on balance theory and stats theory are extracted to improve the performance of a logistic regression classifier in signed relation prediction. In [13], the athors se a probabilistic treatment of trst combined with a modified springembedded layot algorithm to classify a relation based on balance theory. Instead of having all sers repel, the model adds a repelling force only between sers connected with a negative relation to captre balance theory. For example, one is friends with an enemy of the other; the forces will psh them in different locations. In [10], the athors show how any qantitative measre of social imbalance in a network can be sed to derive a link prediction algorithm and extend the approach in [27] by presenting a spervised machine learning based link prediction method that ses featres derived from longer cycles in the network. The motivation to derive featres from longer cycles is that higher order cycles in a signed network yield a measre of imbalance sggested the balance theory. In [18], it shows that the notion of weak strctral balance in signed networks natrally leads to a global lowrank model for the network. Under sch a model, the sign inference problem can be formlated as a lowrank matrix completion problem Tie Prediction networks in social media can be a composite of varios types of relations. For example, the relation types in Facebook cold be family, colleages, classmates and friends. However, in most online networks sch as Facebook, Twitter and LinkedIn, sch type information is sally navailable [56]. Different types of relations may inflence people in different ways. For example, one ser s work style may be mainly inflenced by her/his colleages; while the daily life habits may be strongly affected by her/his family. It is necessary and important to reveal these different types of social relations therefore we ask whether we can atomatically infer the types of social relations for social networks in social media. A novel task of social tie prediction is designed to answer the above qestion, which aims to predict the type of a given social relation. A nonzero vale of S ij sggests that there is a connection between i and j. Formally social tie prediction is to predict the type of a social relation between i and j with S ij 0 from a finite set of categorical types sch as { family, classmates, colleages and friends}. In [53], a framework is proposed to classify the type of social relationships by learning across heterogeneos networks. The framework incorporates social theories sch as balance theory and stats theory into a factor graph model, which effectively improves the accracy of inferring the type of social relationships in a target network by borrowing knowledge from a different sorce network. Balance theory and stats theory shold be general over different types of networks. To learn knowledge from the sorce network to the target network, transfer featres are extracted based on balance theory and stats theory, which are shared by different types of networks. In particlar, from social balance, the paper extracts triad based featres to denote the proportion of different balanced triangles in a network; and from stats theory, it defines featres over triads to respectively represent the probabilities of the seven most freqent formations of triads. Different from [53], approaches are sggested by [68] to model balance theory and stats theory mathematically. To model the balance theory, it introdces an onedimensional latent factor β i for each ser i and defines the sign between i and j as s ij = β iβ j. To model stats theory, it introdces a global serindependent parameter η to captre the partial ordering of sers. η maps the latent ser profile of i γ i to a scalar qantity l i = ηγ i, which reflects the corresponding ser i s social stats. According to stats theory, it characterizes social ties from i to j by modeling the relative stats difference between them as l ij = l i l j Tie Strength Prediction media sers can have hndreds of social relations. However, a recent stdy shows that Twitter sers have a very small nmber of friends compared to the nmber of followers and followees they declare [21]. The low cost of link formation in social media can lead to networks with heterogeneos relationship strengths (e.g., acqaintances and best friends mixed together) [65]. Pairs of sers with strong strengths are likely to share greater similarity than those with weak strengths; therefore a better nderstanding of strengths of social relations can help social media sites serve their cstomers well sch as better recommendations and more effective friend management tools, which arises the problem of tie strength prediction. In the binary relation presentation, once there is a connection between i and j, S ij = 1. The task of tie strength prediction is to predict a connection strength between 0 and 1 for i and j with
7 S ij = 1. After tie strength prediction, the binary relation representation matrix S ij {0,1} will be converted into a continosvaledrelationrepresentationmatrixs ij [0,1]. In [24], gided by social correlation theory, for different categories of featres, i.e., attribte similarity, topological connectivity, transactional connectivity, and network transactional connectivity, are extracted from sorces inclding friendship links, profile information, wall postings, pictre postings, and grop memberships. Then varios classifiers are trained to predict link strength from transactional information based on these extracted featres. A nspervised latent variable model is proposed to predict tie strength in online social network [65] with ser profiles and interactions. One key nderlying assmption of the proposed model is social correlation theory. Homophily in social correlation theory postlates that sers tend to form ties with other people who have similar characteristics, and it is likely that the stronger the tie, the higher the similarity. Ths the proposed framework models the tie strength as homophily effect of nodal profile similarities. The relationship strength directly inflences the natre and freqency of online interactions between a pair of sers. The stronger the relationship, the higher likelihood that a certain type of interaction between the pair of sers. Therefore the propose framework models the relationship strength as the hidden case of inflence among sers. 4.3 Theories in ContentRelated Tasks Nmeros techniqes are developed for varios content mining tasks sch as classification and clstering in the last decade. User generated content in social media is sally linked, noisy, highly nstrctred and incomplete, which determines that existing techniqes become difficlt when applying these mining tasks on ser generated content in social media. Before the poplarity of social media, researchers have already noticed that exploiting link information can improve content classification [72] and clstering [32]. The poplarity of social media makes social relations pervasively available, which encorages the exploitation of social relations in more and more mining tasks. theories can help s nderstand social relations better and in this sbsection, we review how social theories help some representative contentrelated tasks Recommendation The pervasive se of social media generates massive data in an nprecedented rate and the information overload problem becomes increasingly serve for social media sers. Recommendation has been proven to be effective in mitigating the information overload problem and presents its significance to improve the qality of ser experience, and to positively impact the sccess of social media. Users in the physical world are likely to seek sggestions from their friends before making a prchase decision and sers friends consistently provide good recommendations [45], we have similar observations in the online worlds. For example, 66% of people on social sites have asked friends or followers to help them make a decision and 88% of links that 1424 year olds clicked were sent to them by a friend and 78% of consmers trst peer recommendations over ads and Google SERPs 1. These 1 intitions motive a new research direction of recommendation social recommendation, which aims to take advantage of social relations to improve the performance of recommendation. Formally, a social recommender system is to predict missing vales in the sercontent interaction matrix R based on information from the serser relation matrix S and the observed vales in R [51]. The major reason why people believe that social relations are helpfl to improve recommendation performance is evidence from social correlation theory, which sggests that a ser s preference is similar to or inflenced by their directly connected friends [51]. Therefore social media sers rarely make decisions independently and sally seek advice from their friends before making prchase decisions. relations may provide both similar and familiar evidence for sers, MoleTrst ses socially connected sers to replace similar sers in traditional serbased collaborative filtering method for recommendation in [39]. correlation theory indicates that a ser s preference shold be similar to her/his social network. Ensemble methods predict a missing vale for a given sers as a linear combination of ratings from the ser and her/his social network based on traditional matrix factorization CF method with the intition that sers and their social networks shold have similar ratings on the same items [50]. While reglarization methods add reglarization terms to force the preference of a ser close to that of sers in her/his social network nder the matrix factorization CF method. For example, MF defines a reglarization term to force the preference of a ser to be close to the average preference of the ser s social network [22], and SoReg ses social reglarization to force the preferences of two connected sers close [35] Featre Selection One characteristic of ser generated content in social media is highdimensional sch as there are tens of thosands of terms in tweets or pixels for photos in Flickr. Traditional data mining tasks sch as classification and clstering may fail de to the crse of dimensionality. Featre selection has been proven to be an effective way to handle highdimensional data for efficient data mining [31]. As mentioned above, ser generated content is linked de to the availability of social relations and poses challenges to traditional featre selection algorithms which are typically designed for IID data. The formal definition of featre selection for ser generated content in social media is stated as [52] we aim to develop a selector which selects a sbset of most relevant featres from F on the contentfeatre matrix C with its social context S and R. LinkedFS is proposed as a featre selection framework for ser generated content with social context based on social correlation theory in [52]. For types of relations, i.e., co Post, cofollowing, cofollowed and Following, are extracted from social context S and R of ser generated content C. correlation theory sggests that linked sers are likely to share similar topics. Based on social correlation theory, LinkedFS trns these for types of relations to for corresponding hypotheses that can affect featre selection with linked data. For example, following hypothesis assmes that one ser i follows another ser j becase i share j s interests, and their ser generated content is more likely similar in terms of topics; hence LinkedFS models following relations mathematically by forcing topics of two sers with
8 following relations close to each other. LinkedFS jointly incorporates grop Lasso with the reglarization term to model each type of relations for featre selection Sentiment Analysis Nowadays social media services sch as Twitter and Facebook are increasingly sed by online sers to share and exchange opinions, providing rich resorces to nderstand pblic opinions. For example, in [3], a simple model exploiting Twitter sentiment and content otperforms marketbased predictors in terms of forecasting boxoffice revenes for movies; pblic mood as measred from a largescale collection of tweets obtains an accracy of 86.7% in predicting the daily p and down changes in the closing vales of the DJIA [8]. Therefore sentiment analysis for sch opinionrich social media data has attracted increasing attention in recent years [46; 20]. Formally sentiment analysis for ser generated content with social relations is to obtain a predictor from the contentfeatre matrix C with its social context S and R, which can atomatically label the sentiment polarity of an nseen post. correlation theory indicates that sentiments of two linked sers are likely to be similar. In [48], graphical models are proposed to incorporate social network information to improve serlevel sentiment classification of different topics based on two observations (1) ser pairs in which at least one party links to the other are more likely to hold the same sentiment, and (2) two sers with the same sentiment are more likely to have at least one link to the other than two sers with different sentiment. correlation theory sggests that social relations are kinds of sentiment correlations. In [46], the athors propagate sentiment labels of tweets via serser social relations S and sertweet relations R to assign sentiment labels to nlabeled tweets. In [20], tweettweet correlation network are bilt from S and R based on social correlation theory. For example, tweets from sers with following relations shold be correlated as sggested by social correlation theory. Two tweets linked in the tweettweet correlation network are likely to share similar sentiments; hence the proposed framework SANT adds a graph reglarization term in the Lasso classifier to force the sentiments of two correlated tweets close to each other. 4.4 Discssion In reviewing stateoftheart algorithms that exploit social theories in mining social media, we nderstand that they aim to find mathematical explanations of social theories for comptational models. We notice that algorithms share similar ways in applying social theories sch as featre engineering, constraint generating and objective defining. Featre Engineering: It ses social theories to extract featres for comptational models. For example, in link prediction, confonding effect in social correlation theory sggests that people who are physically close have a better likelihood of forming new links and new featres from sers mobility information are extracted in [59] to improve link predilection; while triad featres based on stats theory are extracted as transfer featres to infer social ties by transferring knowledge from the sorce network to the target network [53]. Constraint Generating: It generates constraints from social theories for comptational models. Reglariza Media Mining Tasks User Related Relation Related Content Related Featre Engineering Constraint Generating Objective Defining Commnity Detection [42],[59],[65],[57],[2] User Classification [35],[33],[41],[63],[53] Spammer Detection [28],[6] [72],[18] Link Prediction [29],[58],[26],[10] [48] [5],[12],[17] Tie Prediction [52] [67] Tie Strength Prediction [23] [64] Recommendation [21],[23] [38] Featre Selection [51] Sentiment Analysis [47],[19] [45] Figre 6: Theories in Media Mining. tion is one of the most poplar ways to implement constraint generating. For example, MF in social recommendation adds a social reglarization term to force the performance of a ser close to that of her/his social network to captre social correlation theories [22]; and htrst adds a homophily reglarization term to captre homophily effect and mitigate the sparsity problem in link prediction [49]. Objective Defining: It ses social theories to define the objectives of the comptational models. For example, two objectives are defined from balance theory to detect commnities in signed networks [2]; and the ser classification task is to make the labels of a ser similar to these of her/his social network [15]. Instead of brteforce search, social theories can gide s to extract relevant featres via featre engineering, to generate constraints via constraint generating, and to define objectives via objective defining for comptational models. The algorithms reviewed earlier that exploit social theories in varios social media mining tasks are smmarized in Figre 6. We notice that for the same task, social theories can be exploited in different ways. For example, for link prediction, social theories are explained via featre engineering, constraint generating and objective defining. 5. OPEN ISSUES AND FUTURE RESEARCH DIRECTIONS 5.1 More in Mining Media Data Some social theories have been proven to be applicable to social media data, which encorages s to pt social in social media mining. Integrating some social theories with comptational models advances varios social media mining tasks and has attracted increasing attention. The exciting progress not only proves that the direction of integrating social theories in mining social media data is appealing bt also sggests that we shold pt more social in social media mining. In this article, we review the stateoftheart algorithms that employ social correlation theory, balance theory and stats theory in varios social media mining tasks. These theories are jst illstrative examples and there cold be more social theories to be applicable and employed sch as small world theory [74] as shown in recent efforts to investigate and verify more social theories for social media data. Some of these efforts have already made initial progress sch as strctral hole theory [9] and weak tie theory [16]. A person is said to span a strctral hole in a social network if he or she is linked to people in parts of the network that
9 are otherwise not well connected to one another [9]. Tang et al.[56] employ strctral hole theory in the problem of social tie prediction; while Lo and Tang confirm the importance of strctral hole in information diffsion with social media data, and show that mining strctral hole can benefit varios social media mining tasks sch as commnity detection and link prediction [33]. Weak tie theory sggests that more novel information flows to individals throgh weak rather than strong ties [16]. Recently researchers find that weak ties of a ser are helpfl to predict the preference of the ser for ser classification [54] and social recommendation [71]. 5.2 New Theories No dobt that social media data is a new type of social data and is mch more complicated than the data social sciences se to stdy social theories. It is highly possible that new social theories can be discovered from social media data to make meaningfl progress on important problems in social media mining, however, that progress reqires serios engagement of both compter scientists and social scientists [61]. Data availability is still a challenging problem for social scientists. The data reqired to address many problems of interest to social scientists remain difficlt to assemble and it has been impossible to collect observational data on the scale of hndreds of millions, or even tens of thosands, of individals [61]. media provides a virtal world for sers online activities and makes it possible for social scientists to observe social behavior and interaction data of hndreds of millions of sers. However social media data is too big to be directly handled by social scientists. On the other hand, compter scientists can employ data mining and machine learning techniqes to handle big social media data; bt, we lack necessary theories to help s nderstand social media data better. For example, withot a better nderstanding of social media data, compter scientists may waste a lot of time in featre engineering, which is the key to the sccess of many realworld applications [12]. Therefore engagement of both compter scientists and social scientists in social media data is trly mtally beneficial. Compter scientists can take advantage of social theories to mine social media data and provide comptational tools that are of great potential benefit to social scientists; while social scientists can make se of comptational tools to handle social media data and develop new social theories to help compter scientists provide better comptational tools. 6. CONCLUSION The social natre of social media data calls for new techniqes and tools and cltivates a new field social media mining. theories from social sciences have been proven to be applicable to mining social media. Integrating social theories with comptational models is becoming an interesting way in mining social media data and makes exciting progress in varios social media mining tasks. In this article, we review three key social theories, i.e., social correlation theory, balance theory and stats theory, in mining social media data. In detail, we introdce basic concepts, verification methods, interesting findings and the stateoftheart algorithms to exploit these social theories in social media mining tasks, which can be categorized to featre engineering, constraint generating and objective defining. As ftre directions, more existing social theories cold be employed or new social theories cold be discovered to advance social media mining. Acknowledgments This work is, in part, spported by NSF (#IIS ), ARO(#025071), ONR(N ) and a research fnd from Yahoo Faclty Research and Engagement Program. 7. REFERENCES [1] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding highqality content in social media. In WSDM, [2] A. Amelio and C. Pizzti. Commnity mining in signed networks: a mltiobjective approach. In ASONAM, [3] S. Asr and B. A. Hberman. Predicting the ftre with social media. In WIIAT, [4] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. The role of social networks in information diffsion. In WWW, [5] A.L. Barabási and R. Albert. Emergence of scaling in random networks. science, [6] S. Y. Bhat and M. Ablaish. Commnitybased featres for identifying spammers in online social networks. In ASONAM, pages ACM, [7] H. Bisgin, N. Agarwal, and X. X. Investigating homophily in online social networks. In WIIAT, [8] J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Jornal of Comptational Science, 2(1):1 8, [9] R. S. Brt. Strctral holes: The social strctre of competition. Harvard niversity press, [10] K.Y. Chiang, N. Natarajan, A. Tewari, and I. S. Dhillon. Exploiting longer cycles for link prediction in signed networks. In CIKM, [11] J. A. Davis. Clstering and strctral balance in graphs. Hman relations, [12] P. Domingos. A few sefl things to know abot machine learning. Commnications of the ACM, [13] T. DBois, J. Golbeck, and A. Srinivasan. Predicting trst and distrst in social networks. In socialcom, [14] H. Gao, J. Tang, and H. Li. Exploring socialhistorical ties on locationbased social networks. In ICWSM, [15] L. Getoor and C. P. Diehl. Link mining: a srvey. ACM SIGKDD Explorations Newsletter, [16] M. Granovetter. The strength of weak ties. JSTOR, [17] F. Heider. Attitdes and cognitive organization. The Jornal of psychology, 1946.
10 [18] C.J. Hsieh, K.Y. Chiang, and I. S. Dhillon. Low rank modeling of signed networks. In KDD, [19] X. H, J. Tang, Y. Zhang, and H. Li. spammer detection in microblogging. In IJCAI, [20] X. H, L. Tang, J. Tang, and H. Li. Exploiting social relations for sentiment analysis in microblogging. In WSDM, [21] B. Hberman, D. M. Romero, and F. W. networks that matter: Twitter nder the microscope. First Monday, [22] M. Jamali and M. Ester. A matrix factorization techniqe with trst propagation for recommendation in social networks. In Recsys, [23] D. Jensen and J. Neville. Linkage and atocorrelation case featre selection bias in relational learning. In ICML, [24] I. Kahanda and J. Neville. Using transactional information to predict link strength in online social networks. In ICWSM, [25] D. Kim, D. Kim, E. Hwang, and S. Rho. Twittertrends: a spatiotemporal trend detection and related keywords recommendation scheme. Mltimedia Systems,2014. [26] M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attribtes are predictable from digital records of hman behavior. PNAS, [27] J. Leskovec, D. Httenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, [28] J. Leskovec, D. Httenlocher, and J. Kleinberg. Signed networks in social media. In CHI, [29] F. Li and M.H. Hsieh. An empirical stdy of clstering behavior of spammers and gropbased antispam strategies. In CEAS, [30] D. LibenNowell and J. Kleinberg. The linkprediction problem for social networks. JASIST, [31] H. Li and H. Motoda. Comptational methods of featre selection. CRC Press, [32] B. Long, Z. M. Zhang, X. W, and P. S. Y. Spectral clstering for mltitype relational data. In ICML, [33] T. Lo and J. Tang. Mining strctral hole spanners throgh information diffsion in social networks. In WWW, [34] Q. L and L. Getoor. Linkbased classification. In ICML, [35] H.Ma, D.Zho, C.Li, M.R.Ly, andi.king.recommender systems with social reglarization. In WSDM, [36] S. A. Macskassy and F. Provost. A simple relational classifier. In MRDM, [37] S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a nivariate case stdy. JMLR, [38] P. Massa. A srvey of trst se and modeling in real online systems. Trst in Eservices: Technologies, Practices and Challenges, [39] P. Massa and P. Avesani. Trstaware collaborative filtering for recommender systems. In CoopIS, DOA, and ODBASE, [40] M. McPherson, L. SmithLovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annal review of sociology, [41] A. Mislove, B. Viswanath, K. P. Gmmadi, and P. Drschel. Yo are who yo know: inferring ser profiles in online social networks. In WSDM, [42] J. Neville and D. Jensen. Leveraging relational atocorrelation with latent grop models. In MRDM, [43] M. E. Newman and M. Girvan. Finding and evalating commnity strctre in networks. PRE, 69(2):026113, [44] S. Papadopolos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Commnity detection in social media. DMKD, [45] R. R. Sinha and K. Swearingen. Comparing recommendations made by online systems and friends. In DELOS, [46] M. Sperios, N. Sdan, S. Upadhyay, and J. Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In ULNLP, [47] G. Stringhini, C. Kregel, and G. Vigna. Detecting spammers on social networks. In ACSAC, [48] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zho, and P. Li. Userlevel sentiment analysis incorporating social networks. In KDD, [49] J. Tang, H. Gao, X. H, and H. Li. Exploiting homophily effect for trst prediction. In WSDM, [50] J. Tang, H. Gao, and H. Li. mtrst: discerning mltifaceted trst in a connected world. In WSDM, [51] J. Tang, X. H, and H. Li. recommendation: a review. SNAM, [52] J. Tang and H. Li. Featre selection with linked data in social media. In SDM, [53] J. Tang, T. Lo, and J. Kleinberg. Inferring social ties across heterogenos networks. In WSDM, [54] L. Tang and H. Li. Relational learning via latent social dimensions. In KDD, [55] L. Tang and H. Li. Commnity detection and mining in social media. Synthesis Lectres on Data Mining and Knowledge Discovery, 2010.
11 [56] W. Tang, H. Zhang, and J. Tang. Learning to infer social ties in large networks. In PKDD, [57] B. Taskar, P. Abbeel, M.F. Wong, and D. Koller. Label and link prediction in relational data. In SRL, [58] V. Traag and J. Brggeman. Commnity detection in networks with positive and negative links. PRE, 80(3):036115, [59] D. Wang, D. Pedreschi, C. Song, F. Giannotti, and A. L. Barabasi. Hman mobility, social ties, and link prediction. In KDD, [60] X. Wang, L. Tang, H. Gao, and H. Li. Discovering overlapping grops in social media. In ICDM, [61] D. J. Watts. Comptational social science: Exciting progress and ftre directions. Winter Isse of The Bridge on Frontiers of Engineering, [62] S. Webb, J. Caverlee, and C. P. honeypots: Making friends with a spammer near yo. In CEAS, [63] J. Weng, E.P. Lim, J. Jiang, and Q. He. Twitterrank: finding topicsensitive inflential twitterers. In WSDM, [64] R. Xiang and J. Neville. Collective inference for network data with copla latent markov networks. In WSDM, pages ACM, [65] R. Xiang, J. Neville, and M. Rogati. Modeling relationship strength in online social networks. In WWW, [66] X. X, N. Yrk, Z. Feng, and T. A. Schweiger. Scan: a strctral clstering algorithm for networks. In KDD, [67] J. Yang and J. Leskovec. Modeling information diffsion in implicit networks. In ICDM, [68] S.H. Yang, A. J. Smola, B. Long, H. Zha, and Y. Chang. Friend or frenemy?: predicting signed ties in social networks. In SIGIR, [69] M. Ye, X. Li, and W.C. Lee. Exploring social inflence for recommendation: a generative model approach. In SIGIR, [70] R. Zafarani, M. A. Abbasi, and H. Li. Media Mining: An Introdction. Cambridge University Press, [71] X. Zhang, J. Cheng, T. Yan, B. Ni, and H. L. Toprec: domainspecific recommendation throgh commnity topic mining in social network. In WWW, [72] S.Zh, K.Y, Y.Chi, andy.gong.combiningcontent and link for classification sing matrix factorization. In SIGIR, [73] Y. Zh, X. Wang, E. Zhong, N. N. Li, H. Li, and Q. Yang. Discovering spammers in social networks. In AAAI, [74] D. Watts, and S, Steven. Collective dynamics of smallworld networks. In natre, 1998.
TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University
9 Setting a Course: Goals for the Help Desk
IT Help Desk in Higher Edcation ECAR Research Stdy 8, 2007 9 Setting a Corse: Goals for the Help Desk First say to yorself what yo wold be; and then do what yo have to do. Epictets Key Findings Majorities
Corporate performance: What do investors want to know? Innovate your way to clearer financial reporting
www.pwc.com Corporate performance: What do investors want to know? Innovate yor way to clearer financial reporting October 2014 PwC I Innovate yor way to clearer financial reporting t 1 Contents Introdction
Closer Look at ACOs. Making the Most of Accountable Care Organizations (ACOs): What Advocates Need to Know
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Updated
GUIDELINE. Guideline for the Selection of Engineering Services
GUIDELINE Gideline for the Selection of Engineering Services 1998 Mission Statement: To govern the engineering profession while enhancing engineering practice and enhancing engineering cltre Pblished by
Position paper smart city. economics. a multi-sided approach to financing the smart city. Your business technologists.
Position paper smart city economics a mlti-sided approach to financing the smart city Yor bsiness technologists. Powering progress From idea to reality The hman race is becoming increasingly rbanised so
KEYS TO BEING AN EFFECTIVE WORKPLACE PERSONAL ASSISTANT
5 KEYS TO BEING AN EFFECTIVE WORKPLACE PERSONAL ASSISTANT by: John Barrett Personal assistants (PAs) and their ability to effectively provide essential spports at the workplace are extremely important
Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game
International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg
Designing and Deploying File Servers
C H A P T E R 2 Designing and Deploying File Servers File servers rnning the Microsoft Windows Server 2003 operating system are ideal for providing access to files for sers in medim and large organizations.
2.1 Unconstrained Graph Partitioning. 1.2 Contributions. 1.3 Related Work. 1.4 Paper Organization 2. GRAPH-THEORETIC APPROACH
Mining Newsgrops Using Networks Arising From Social Behavior Rakesh Agrawal Sridhar Rajagopalan Ramakrishnan Srikant Yirong X IBM Almaden Research Center 6 Harry Road, San Jose, CA 95120 ABSTRACT Recent
An unbiased crawling strategy for directed social networks
Abstract An nbiased crawling strategy for directed social networks Xeha Yang 1,2, HongbinLi 2* 1 School of Software, Shenyang Normal University, Shenyang 110034, Liaoning, China 2 Shenyang Institte of
Curriculum development
DES MOINES AREA COMMUNITY COLLEGE Crriclm development Competency-Based Edcation www.dmacc.ed Why does DMACC se competency-based edcation? DMACC tilizes competency-based edcation for a nmber of reasons.
Using GPU to Compute Options and Derivatives
Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation
The Good Governance Standard for Public Services
The Good Governance Standard for Pblic Services The Independent Commission for Good Governance in Pblic Services The Independent Commission for Good Governance in Pblic Services, chaired by Sir Alan Langlands,
Introduction to HBase Schema Design
Introdction to HBase Schema Design Amandeep Khrana Amandeep Khrana is a Soltions Architect at Clodera and works on bilding soltions sing the Hadoop stack. He is also a co-athor of HBase in Action. Prior
7 Help Desk Tools. Key Findings. The Automated Help Desk
7 Help Desk Tools Or Age of Anxiety is, in great part, the reslt of trying to do today s jobs with yesterday s tools. Marshall McLhan Key Findings Help desk atomation featres are common and are sally part
Deploying Network Load Balancing
C H A P T E R 9 Deploying Network Load Balancing After completing the design for the applications and services in yor Network Load Balancing clster, yo are ready to deploy the clster rnning the Microsoft
The Good Governance Standard for Public Services
The Good Governance Standard for Pblic Services The Independent Commission on Good Governance in Pblic Services Good Governance Standard for Pblic Services OPM and CIPFA, 2004 OPM (Office for Pblic Management
CRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Farley Beaton Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging Lessons Learned 2
A Stdy on Cstomer Service Qality of Banks in India Dr. Manasa Nagabhshanam Lead Researcher Analyz Research Soltions Pvt. Ltd. Bangalore BLANK Table of Contents Chapter 1 315-317 Introdction 315 1.1 Role
Enabling Advanced Windows Server 2003 Active Directory Features
C H A P T E R 5 Enabling Advanced Windows Server 2003 Active Directory Featres The Microsoft Windows Server 2003 Active Directory directory service enables yo to introdce advanced featres into yor environment
Executive Coaching to Activate the Renegade Leader Within. Renegades Do What Others Won t To Get the Results that Others Don t
Exective Coaching to Activate the Renegade Leader Within Renegades Do What Others Won t To Get the Reslts that Others Don t Introdction Renegade Leaders are a niqe breed of leaders. The Renegade Leader
Opening the Door to Your New Home
Opening the Door to Yor New Home A Gide to Bying and Financing. Contents Navigating Yor Way to Home Ownership...1 Getting Started...3 Finding Yor Home...9 Finalizing Yor Financing...12 Final Closing...13
Every manufacturer is confronted with the problem
HOW MANY PARTS TO MAKE AT ONCE FORD W. HARRIS Prodction Engineer Reprinted from Factory, The Magazine of Management, Volme 10, Nmber 2, Febrary 1913, pp. 135-136, 152 Interest on capital tied p in wages,
Candidate: Kevin Taylor. Date: 04/02/2012
Systems Analyst / Network Administrator Assessment Report 04/02/2012 www.resorceassociates.com To Improve Prodctivity Throgh People. 04/02/2012 Prepared For: Resorce Associates Prepared by: John Lonsbry,
Motorola Reinvents its Supplier Negotiation Process Using Emptoris and Saves $600 Million. An Emptoris Case Study. Emptoris, Inc. www.emptoris.
Motorola Reinvents its Spplier Negotiation Process Using Emptoris and Saves $600 Million An Emptoris Case Stdy Emptoris, Inc. www.emptoris.com VIII-03/3/05 Exective Smmary With the disastros telecommnication
Planning a Smart Card Deployment
C H A P T E R 1 7 Planning a Smart Card Deployment Smart card spport in Microsoft Windows Server 2003 enables yo to enhance the secrity of many critical fnctions, inclding client athentication, interactive
WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor
WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool
The Boutique Premium. Do Boutique Investment Managers Create Value? AMG White Paper June 2015 1
The Botiqe Premim Do Botiqe Investment Managers Create Vale? AMG White Paper Jne 2015 1 Exective Smmary Botiqe active investment managers have otperformed both non-botiqe peers and indices over the last
ENGAGING ADJUNCT AND FULL-TIME FACULTY IN STUDENT SUCCESS INNOVATION
CUTTING EDGE SERIES ENGAGING ADJUNCT AND FULL-TIME FACULTY IN STUDENT SUCCESS INNOVATION No. 1 Ctting Edge Series: Engaging Adjnct and Fll-Time Faclty in Stdent Sccess Innovation 1 Dear Colleages: In recent
Introducing Revenue Cycle Optimization! STI Provides More Options Than Any Other Software Vendor. ChartMaker Clinical 3.7
Introdcing Revene Cycle Optimization! STI Provides More Options Than Any Other Software Vendor ChartMaker Clinical 3.7 2011 Amblatory EHR + Cardiovasclar Medicine + Child Health STI Provides More Choices
A taxonomy of knowledge management software tools: origins and applications
Evalation and Program Planning 25 2002) 183±190 www.elsevier.com/locate/evalprogplan A taxonomy of knowledge management software tools: origins and applications Peter Tyndale* Kingston University Bsiness
A guide to safety recalls in the used vehicle industry GUIDE
A gide to safety recalls in the sed vehicle indstry GUIDE Definitions Aftermarket parts means any prodct manfactred to be fitted to a vehicle after it has left the vehicle manfactrer s prodction line.
Designing a TCP/IP Network
C H A P T E R 1 Designing a TCP/IP Network The TCP/IP protocol site defines indstry standard networking protocols for data networks, inclding the Internet. Determining the best design and implementation
Designing an Authentication Strategy
C H A P T E R 1 4 Designing an Athentication Strategy Most organizations need to spport seamless access to the network for mltiple types of sers, sch as workers in offices, employees who are traveling,
Optimal Trust Network Analysis with Subjective Logic
The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway
Planning a Managed Environment
C H A P T E R 1 Planning a Managed Environment Many organizations are moving towards a highly managed compting environment based on a configration management infrastrctre that is designed to redce the
Practical Tips for Teaching Large Classes
Embracing Diversity: Toolkit for Creating Inclsive, Learning-Friendly Environments Specialized Booklet 2 Practical Tips for Teaching Large Classes A Teacher s Gide Practical Tips for Teaching Large Classes:
Effective governance to support medical revalidation
Effective governance to spport medical revalidation A handbook for boards and governing bodies This docment sets ot a view of the core elements of effective local governance of the systems that spport
CRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Kenneth W. Thorson Tax Commissioner Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging
Towers Watson Manager Research
Towers Watson Manager Research How we se fnd performance data Harald Eggerstedt 13. März 212 212 Towers Watson. All rights reserved. Manager selection at Towers Watson The goal is to find managers that
Apache Hadoop. The Scalability Update. Source of Innovation
FILE SYSTEMS Apache Hadoop The Scalability Update KONSTANTIN V. SHVACHKO Konstantin V. Shvachko is a veteran Hadoop developer. He is a principal Hadoop architect at ebay. Konstantin specializes in efficient
EMC ViPR. Concepts Guide. Version 1.1.0 302-000-482 02
EMC ViPR Version 1.1.0 Concepts Gide 302-000-482 02 Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Febrary, 2014 EMC believes the information in this pblication is
Chapter 1. LAN Design
Chapter 1 LAN Design CCNA3-1 Chapter 1 Note for Instrctors These presentations are the reslt of a collaboration among the instrctors at St. Clair College in Windsor, Ontario. Thanks mst go ot to Rick Graziani
Planning an Active Directory Deployment Project
C H A P T E R 1 Planning an Active Directory Deployment Project When yo deploy the Microsoft Windows Server 2003 Active Directory directory service in yor environment, yo can take advantage of the centralized,
Closer Look at ACOs. Putting the Accountability in Accountable Care Organizations: Payment and Quality Measurements. Introduction
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Janary 2012
EMC PowerPath Virtual Appliance
EMC PowerPath Virtal Appliance Version 1.2 Administration Gide P/N 302-000-475 REV 01 Copyright 2013 EMC Corporation. All rights reserved. Pblished in USA. Pblished October, 2013 EMC believes the information
Regular Specifications of Resource Requirements for Embedded Control Software
Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces
A Contemporary Approach
BORICP01.doc - 1 Second Edition Edcational Psychology A Contemporary Approach Gary D. Borich The University of Texas at Astin Martin L. Tombari University of Denver (This pblication may be reprodced for
USA Funds Life Skills Course Summaries. Financial Aid and Paying for College. 101 How Will I Pay for My Higher Education?
USA Fnds Life Skills Corse Smmaries Financial Aid and Paying for College 101 How Will I Pay for My Higher Edcation? Teaches stdents how to find resorces and fnds to finance their higher edcation by examining
Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD
Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,
Candidate: Cassandra Emery. Date: 04/02/2012
Market Analyst Assessment Report 04/02/2012 www.resorceassociates.com To Improve Prodctivity Throgh People. 04/02/2012 Prepared For: Resorce Associates Prepared by: John Lonsbry, Ph.D. & Lcy Gibson, Ph.D.,
NAPA TRAINING PROGRAMS FOR:
NAPA TRAINING PROGRAMS FOR: Employees Otside Sales Store Managers Store Owners See NEW ecatalog Inside O V E R V I E W 2010_StoreTrainingBrochre_SinglePg.indd 1 5/25/10 12:39:32 PM Welcome 2010 Store Training
Purposefully Engineered High-Performing Income Protection
The Intelligent Choice for Disability Income Insrance Prposeflly Engineered High-Performing Income Protection Keeping Income strong We engineer or disability income prodcts with featres that deliver benefits
How To Link Data Across Agencies
Rasterized 300 dpi Linking Data across Agencies: States That Are Making It Work Updated March 2010 By: Rebecca Carson and Elizabeth Laird, Data Qality Campaign; Elizabeth Gaines and Thaddes Ferber, The
Planning and Implementing An Optimized Private Cloud
W H I T E PA P E R Intelligent HPC Management Planning and Implementing An Optimized Private Clod Creating a Clod Environment That Maximizes Yor ROI Planning and Implementing An Optimized Private Clod
Anatomy of SIP Attacks
Anatomy of SIP Attacks João M. Ceron, Klas Steding-Jessen, and Cristine Hoepers João Marcelo Ceron is a Secrity Analyst at CERT.br/NIC.br. He holds a master s degree from Federal University of Rio Grande
Sickness Absence in the UK: 1984-2002
Sickness Absence in the UK: 1984-2002 Tim Barmby (Universy of Drham) Marco Ecolani (Universy of Birmingham) John Treble (Universy of Wales Swansea) Paper prepared for presentation at The Economic Concil
Curriculum for the course GENDER EQUALITY TRAINING FOR DECISION-MAKERS, EDUCATORS AND LEADERS OF NGOs
UDK 342.7 Me -13 EU Socrates project WO-MEN: GENDER EQUALITY CREATES DEMOCRACY No. 109771-CP-1-2003-1-LT-GRUNDTVIG-G1 Transnational Cooperation Project Co-financed by the Eropean Commission DG Edcation
On the urbanization of poverty
On the rbanization of poverty Martin Ravallion 1 Development Research Grop, World Bank 1818 H Street NW, Washington DC, USA Febrary 001; revised Jly 001 Abstract: Conditions are identified nder which the
The Institute Of Commercial Management. Prospectus. Start Your Career Here! www.icm.ac.uk [email protected]
The Institte Of Commercial Management Prospects Start Yor Career Here! www.icm.ac.k [email protected] The fondation of every state is the edcation of it s yoth. Diogenes Laertis Welcome... Althogh we are
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Facilities. Car Parking and Permit Allocation Policy
Facilities Car Parking and Permit Allocation Policy Facilities Car Parking and Permit Allocation Policy Contents Page 1 Introdction....................................................2 2.0 Application
ASAND: Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks
ASAND: Asynchronos Slot Assignment and Neighbor Discovery Protocol for Wireless Networks Fikret Sivrikaya, Costas Bsch, Malik Magdon-Ismail, Bülent Yener Compter Science Department, Rensselaer Polytechnic
FINANCIAL FITNESS SELECTING A CREDIT CARD. Fact Sheet
FINANCIAL FITNESS Fact Sheet Janary 1998 FL/FF-02 SELECTING A CREDIT CARD Liz Gorham, Ph.D., AFC Assistant Professor and Family Resorce Management Specialist, Utah State University Marsha A. Goetting,
Isilon OneFS. Version 7.1. Backup and recovery guide
Isilon OneFS Version 7.1 Backp and recovery gide Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished March, 2014 EMC believes the information in this pblication is accrate
HSBC Internet Banking. Combined Product Disclosure Statement and Supplementary Product Disclosure Statement
HSBC Internet Banking Combined Prodct Disclosre Statement and Spplementary Prodct Disclosre Statement AN IMPORTANT MESSAGE FOR HSBC CUSTOMERS NOTICE OF CHANGE For HSBC Internet Banking Combined Prodct
Introducing ChartMaker Cloud! STI Provides More Options Than Any Other Software Vendor
Introdcing ChartMaker Clod! STI Provides More Options Than Any Other Software Vendor ChartMaker Clinical 3.7 2011 Amblatory EHR + Cardiovasclar Medicine + Child Health The ChartMaker Medical Site is made
EMC VNX Series. EMC Secure Remote Support for VNX. Version VNX1, VNX2 300-014-340 REV 03
EMC VNX Series Version VNX1, VNX2 EMC Secre Remote Spport for VNX 300-014-340 REV 03 Copyright 2012-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Jly, 2014 EMC believes the information
