Mining Social Media: A Brief Introduction
|
|
|
- Ira Chambers
- 10 years ago
- Views:
Transcription
1 INFORMS 2012 c 2012 INFORMS isbn Mining Social Media: A Brief Introduction Pritam Gundecha, Huan Liu Arizona State University, Tempe, Arizona {[email protected], [email protected]} Abstract Keywords The pervasive use of social media has generated unprecedented amounts of social data. Social media provides easily an accessible platform for users to share information. Mining social media has its potential to extract actionable patterns that can be beneficial for business, users, and consumers. Social media data are vast, noisy, unstructured, and dynamic in nature, and thus novel challenges arise. This tutorial reviews the basics of data mining and social media, introduces representative research problems of mining social media, illustrates the application of data mining to social media using examples, and describes some projects of mining social media for humanitarian assistance and disaster relief for real-world applications. social media; data mining; social data; social media mining; social networking sites; blogging; microblogging; crowdsourcing; HADR; privacy; trust 1. Introduction Data mining research has successfully produced numerous methods, tools, and algorithms for handling large amounts of data to solve real-world problems. Traditional data mining has become an integral part of many application domains including bioinformatics, data warehousing, business intelligence, predictive analytics, and decision support systems. Primary objectives of the data mining process are to effectively handle large-scale data, extract actionable patterns, and gain insightful knowledge. Because social media is widely used for various purposes, vast amounts of user-generated data exist and can be made available for data mining. Data mining of social media can expand researchers capability of understanding new phenomena due to the use of social media and improve business intelligence to provide better services and develop innovative opportunities. For example, data mining techniques can help identify the influential people in the vast blogosphere, detect implicit or hidden groups in a social networking site, sense user sentiments for proactive planning, develop recommendation systems for tasks ranging from buying specific products to making new friends, understand network evolution and changing entity relationships, protect user privacy and security, or build and strengthen trust among users or between users and entities. Mining social media is a burgeoning multidisciplinary area where researchers of different backgrounds can make important contributions that matter for social media research and development. The objective of this tutorial is to introduce social media, data mining, and their confluence mining social media. We attempt to achieve the goal by presenting representative and interesting research issues and important social media tasks based on our experience and research. This tutorial first reviews data mining, social media and its types, and the importance of social media mining. In 2, we briefly introduce representative issues in social media mining. In 3, we highlight the impact of social media mining using three examples based on our current research. Section 4 illustrates how social media mining is applied in some real-world applications two projects on humanitarian assistance and disaster relief (HADR) carried out in the Data Mining and Machine Learning Laboratory (DMML) at Arizona State University (ASU). We conclude this tutorial in 5. 1
2 2 Tutorials in Operations Research, c 2012 INFORMS 1.1. Data Mining Data mining (Tan et al. [55]) is a process of discovering useful or actionable knowledge in large-scale data. Data mining also means knowledge discovery from data (KDD) (Han et al. [24]), which describes the typical process of extracting useful information from raw data. The KDD process broadly consists of the following tasks: data preprocessing, data mining, and postprocessing. These steps need not be separate tasks and can be combined together. Data mining is an integral part of many related fields including statistics, machine learning, pattern recognition, database systems, visualization, data warehouse, and information retrieval (Han et al. [24]). Data mining algorithms are broadly classified into supervised, unsupervised, and semisupervised learning algorithms. Classification is a common example of supervised learning approach. For supervised learning algorithms, a given data set is typically divided into two parts: training and testing data sets with known class labels. Supervised algorithms build classification models from the training data and use the learned models for prediction. To evaluate a classification model s performance, the model is applied to the test data to obtain classification accuracy. Typical supervised learning methods include decision tree induction, k-nearest neighbors, naive Bayes classification, and support vector machines. Unsupervised learning algorithms are designed for data without class labels. Clustering is a common example of unsupervised learning. For a given task, unsupervised learning algorithms build the model based on the similarity or dissimilarity between data objects. Similarity or dissimilarity between the data objects can be measured using proximity measures including Euclidean distance, Minkowski distance, and Mahalanobis distance. Other proximity measures such as simple matching coefficient, Jaccard coefficient, cosine similarity, and Pearson s correlation can also be used to calculate similarity or dissimilarity between the data objects. K-means, hierarchical clustering (agglomerative or partitional methods), and density-based clustering are typical examples of unsupervised learning. Semisupervised learning algorithms are most applicable where there exist small amounts of labeled data and large amounts of unlabeled data. Two typical types of semisupervised learning are semisupervised classification and semisupervised clustering. The former uses labeled data to make classification and unlabeled data to refine the classification boundaries further, and the latter uses labeled data to guide clustering. Cotraining is a representative semisupervised learning algorithm. Active learning algorithms allow users to play an active role in the learning process via labeling. Typically, users are domain experts and their skills are employed to label some data instances for which a machine learning algorithm are confident about its classification. Minimum marginal hyperplane and maximum curiosity are two popular active learning algorithms. Data mining includes other techniques such as association rule mining, anomaly detection, feature selection, instance selection, and visual analytics. Additional details related to these data mining techniques can be found in Han et al. [24], Tan et al. [55], Witten et al. [59], Zhao and Liu [63], and Liu and Motoda [37] Social Media Social media (Kaplan and Haenlein [28]) is defined as a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchanges of user-generated content. Social media is conglomerate of different types of social media sites including traditional media such as newspaper, radio, and television and nontraditional media such as Facebook, Twitter, etc. Table 1 shows characteristics of different types of social media. Social media gives users an easy-to-use way to communicate and network with each other on an unprecedented scale and at rates unseen in traditional media. The popularity of social media continues to grow exponentially, resulting in an evolution of social networks, blogs,
3 Tutorials in Operations Research, c 2012 INFORMS 3 Table 1. Characteristics of different types of social media. Type Online social networking Blogging Microblogging Wikis Social news Social bookmarking Media sharing Opinion, reviews, and ratings Answers Characteristics Online social networks are Web-based services that allow individuals and communities to connect with real-world friends and acquaintances online. Users interact with each other through status updates, comments, media sharing, messages, etc. (e.g., Facebook, Myspace, LinkedIn). A blog is a journal-like website for users, aka bloggers, to contribute textual and multimedia content, arranged in reverse chronological order. Blogs are generally maintained by an individual or by a community (e.g., Huffington Post, Business Insider, Engadget). Microblogs can be considered same a blogs but with limited content (e.g., Twitter, Tumblr, Plurk). A wiki is a collaborative editing environment that allow multiple users to develop Web pages (e.g., Wikipedia, Wikitravel, Wikihow). Social news refers to the sharing and selection of news stories and articles by community of users (e.g., Digg, Slashdot, Reddit). Social bookmarking sites allow users to bookmark Web content for storage, organization, and sharing (e.g., Delicious, StumbleUpon). Media sharing is an umbrella term that refers to the sharing of variety of media on the Web including video, audio, and photo (e.g., YouTube, Flickr, UstreamTV). The primary function of such sites is to collect and publish usersubmitted content in the form of subjective commentary on existing products, services, entertainment, businesses, places, etc. Some of these sites also provide products reviews (e.g., Epinions, Yelp, Cnet). These sites provide a platform for users seeking advice, guidance, or knowledge to ask questions. Other users from the community can answer these questions based on previous experiences, personal opinions, or relevent research. Answers are generally judged using ratings and comments (e.g., Yahoo! answers, WikiAnswers). microblogs, location-based social networks (LBSNs), wikis, social bookmarking applications, social news, media (text, photo, audio, and video) sharing, product and business review sites, etc. Facebook, 1 the social networking site, recorded more than 845 million active users as of December This number suggests that China (approximately 1.3 billion) and India (approximately 1.1 billion) are the only two countries in the world that have larger populations than Facebook. Facebook and Twitter have accrued more than 1.2 billion users, 2 more than thrice the population of the United States and more than the population of any continent except Asia Mining Social Media Vast amounts of user-generated content are created on social media sites every day. This trend is likely to continue with exponentially more content in the future. Hence, it is critical for producers, consumers, and service providers to figure out management and utility of massive user-generated data. Social media growth is driven by these challenges: (1) How 1 (accessed March 2012). 2 Facebook and Twitter have many overlapping users. Hence, 1.2 billion users are not unique.
4 4 Tutorials in Operations Research, c 2012 INFORMS can a user be heard? (2) Which source of information should a user use? (3) How can user experience be improved? Answers to these questions are hidden in the social media data. These challenges present ample opportunities for data miners to develop new algorithms and methods for social media. Data generated on social media sites are different from conventional attribute-value data for classic data mining. Social media data are largely user-generated content on social media sites. Social media data are vast, noisy, distributed, unstructured, and dynamic. These characteristics pose challenges to data mining tasks to invent new efficient techniques and algorithms. For example, Facebook 3 and Twitter 4 report Web traffic data from approximately 149 million and 90 million unique U.S. visitors per month, respectively. According to the video sharing site YouTube, 5 more than 4 billion videos are viewed per day, and 60 hours of videos are uploaded every minute. The picture sharing site Flickr, 6 as of August 2011, hosts more than 6 billion photo images. Web-based, collaborative, and multilingual Wikipedia 7 hosts over 20 million articles attracting over 365 million readers. Depending on social media platforms, social media data can often be very noisy. Removing the noise from the data is essential before performing effective mining. Researchers notice that spammers (Yardi et al. [61], Chu et al. [12]) generate more data than legitimate users. Social media data are distributed because there is no central authority that maintains data from all social media sites. Distributed social media data pose a daunting task for researchers to understand the information flows on the social media. Social media data are often unstructured. To make meaningful observations based on unstructured data from various data sources is a big challenge. For example, social media sites like LinkedIn, Facebook, and Flickr serve different purposes and meet different needs of users. Social media sites are dynamic and continuously evolving. For example, Facebook recently brought about many concepts including a user s timeline, the creation of in-groups for a user, and numerous user privacy policy changes. The dynamic nature of social media data is a significant challenge for continuously and speedily evolving social media sites. There are many additional interesting questions related to human behavior can be studied using social media data. Social media can also help advertisers to find the influential people to maximize the reach of their products within an advertising budget. Social media can help sociologists to uncover the human behavior such as in-group and out-group behaviors of users. Recently, social media was reported to play an instrumental role in facilitating mass movements such as the Arab Spring 8 and Occupy Wall Street Issues in Mining Social Media In this section, we introduce some representative research issues in mining social media Community Analysis A community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group. Based on the context, a community is also referred to as a group, cluster, cohesive subgroup, or module. Communities can be observed via connections in social media because social media allows people to expand social networks online (Tang and Liu [56]). Social media enables people to connect friends and 3 (accessed March 2012). 4 (accessed March 2012). 5 statistics (accessed March 2012). 6 (accessed March 2012). 7 (accessed March 2012). 8 Spring (accessed March 2012). 9 Wall Street (accessed March 2012).
5 Tutorials in Operations Research, c 2012 INFORMS 5 find new users of similar interests. Communities found in social media are broadly classified into explicit and implicit groups. Explicit groups are formed by user subscriptions, whereas implicit groups emerge naturally through interactions. Community analysts are generally faced with issues such as community detection, formation, and evolution. Community detection often refers to the extraction of implicit groups in a network. The main challenges of community detections are that (1) the definition of a community can be subjective, and (2) the lack of ground truth makes community evaluation difficult. Tang and Liu [56] divided community detection methods into four categories: (1) node-centric community detection, where each node satisfies certain properties such as complete mutuality, reachability, node degrees, frequency of within and outside ties, etc. (examples include cliques, k-cliques, and k-clubs); (2) group-centric community detection, where a group needs to satisfy certain properties (for example, minimum group densities); (3) network-centric community detection, where groups are formed based on partition of network into disjoint sets (examples are spectral clustering and modularity maximization); and (4) hierarchycentric community detection, where the goal is to build a hierarchical structure of communities. This allows the analysis of a network with different resolutions. Representative methods are divisive clustering and agglomerative clustering. Social media networks are highly dynamic. Communities can expand, shrink, or dissolve in dynamic networks. Community evolution aims to discover the patterns of a community over time with the presence of dynamic network interactions. Backstrom et al. [8] found that the more friends you have in a group, the more likely you are to join, and communities with cliques grow more slowly than those that are not tightly connected Sentiment Analysis and Opinion Mining Sentiment analysis and opinion mining aim to automatically extract opinions expressed in the user-generated content. Sentiment analysis and opinion mining tools allow businesses to understand product sentiments, brand perception, new product perception, and reputation management. These tools help users to perceive product opinions or sentiments on a global scale. There are many social media sites reporting user opinions of products in many different formats. Monitoring these opinions related to a particular company or product on social media sites is a new challenge. Sentiment analysis is hard because languages used to create contents are ambiguous. Major steps of sentiment analysis are (1) finding relevant documents, (2) finding relevant sections, (3) finding the overall sentiment, (4) quantifying the sentiment, and (5) aggregating all sentiments to form an overview. Basic components of an opinion are (1) an object on which opinion is expressed, (2) an opinion expressed on a object, and (3) the opinion holder. Objects are generally represented as a finite set of features, where each feature represents a finite set of synonymous words or phrases. Opinion mining tasks can be performed at the document level (Turney [58], Pang et al. [49]), sentence level (Riloff and Wiebe [51], Yu and Hatzivassiloglou [62]), or feature level (Hu and Liu [25], Popescu and Etzioni [50], Liu and Maes [36]). Extracting opinions expressed in comparative sentences is a difficult task, and some preliminary work can be found in Jindal and Liu [26], Jindal and Liu [27], Liu [35], and Pang and Lee [48]. Performance evaluation of sentiment analysis is another challenge because of the lack of ground truth Social Recommendation Traditional recommendation systems attempt to recommend items based on aggregated ratings of objects from users or past purchase histories of users. A social recommendation system makes use of user s social network and related information in addition to the traditional recommendation means. Social recommendation is based on the hypothesis that people who are socially connected are more likely to share the same or similar interests
6 6 Tutorials in Operations Research, c 2012 INFORMS (homophily), and users can be easily influenced by the friends they trust and prefer their friends recommendations to random recommendations. Objectives of social recommendation systems are to improve the quality of recommendation and alleviate the problem of information overload. Examples of social recommendation systems are book recommendations based on friends reading lists on Amazon or friend recommendations on Twitter and Facebook. More details on social recommendation systems can be found in Konstas et al. [30], Ma et al. [38], and Backstrom and Leskovec [7] Influence Modeling Social scientists have been exploring influence and homophily in social networks for quite some time (McPherson et al. [42], Lazarsfeld and Merton [34]). It is important to know whether the underlying social network is influence driven or homophily driven. For example, in the advertisement industry, if the social network is influence driven, then the influential users should be identified and incentivized to promote the product or services to the members of the social network. However, if the social network is homophily (similarity) driven, then some individual users should be directly targeted to promote sales. Most social networks have a mixture of both homophily and influence. Hence, distinguishing them is a challenge. Aral et al. [5] and La Fond and Neville [33] gave details on distinguishing social influence and homophily in social networks. Kempe et al. [29] studied the influence maximization problem. For a given information propagation model, influence maximization aims to identify the set of initial influential users from a given snapshot of a social network such that they can influence the maximum number of other users within given budget constraints. Agarwal et al. [3, 4] presented a preliminary model to identify influential bloggers in a community. The blogosphere obeys a power law distribution with a few blogs being extremely influential and a huge number of blogs being largely unknown. They reported that active bloggers are not necessarily influential and proposed effective influence measures to identify influential bloggers. A more general discussion on modeling and data mining in the blogosphere is given by Agarwal and Liu [2] Information Diffusion and Provenance Researchers study how information diffuses and explore different models of information diffusion, including the independent cascade model, the threshold model, the susceptible infected model, and the susceptible infected recovered model. Many such models have been studied (Bailey [10], Granovetter [21], Mahajan [40], Macy [39], Berger [11]). Researchers apply these models to analyze the spread of rumors, computer viruses, and diseases during outbreaks. Two important problems from the social media viewpoint are (1) how information spreads in a social media network and which factors affect the spread, and (2) what plausible sources are, given some information from social media. The first problem of information diffusion has received good attention from researchers. The second problem is still an open research problem information provenance in social media and recognized as a key issue to differentiate rumors from truth. Because social media data are distributed and dynamic, the conventional techniques used in classical provenance research cannot be directly applied in social media Privacy, Security, and Trust The low barrier to and pervasive use of social media give rise to concerns on user privacy and security issues. New challenges arise due to user s opposing needs: on one hand, a user would like to have as many friends and share as much as possible, and on the other hand, a user would like to be as private as possible when needed. However, being gregarious requires openness and transparency, but being private constricts one s sharing. In addition, a social networking site has its business needs to encourage users to easily find each other and expand
7 Tutorials in Operations Research, c 2012 INFORMS 7 their friendship networks as widely as possible in other words, to be open. Hence, social media poses new security challenges to fend off security threats to users and organizations. With the variety of personal information disclosed in user profiles (e.g., information about other users and user networks may be indirectly accessible), individuals may put themselves and members of their social networks at risk for a variety of attacks. Social media has been the target of numerous passive as well as active attacks including stalking, cyberbullying, malvertizing, phishing, social spamming, scamming, and clickjacking. Gross and Acquisti [22] showed that only a few users change the default privacy preferences on Facebook. In some cases, user profiles are completely public, making information available and providing a communication mechanism to anyone who wants to access it. It is no secret that when a profile is made public, malicious users including stalkers, spammers, and hackers can use sensitive information for their personal gain. Sometimes malevolent users can even cause physical or emotional distress to other users (Rosenblum [52]). Narayanan and Shmatikov [45, 46] demonstrated how users privacy can be weakened if an attacker knows the presence of connections among users. Wondracek et al. [60] presented a successful scheme to breach privacy by exploiting only the group membership information of users. Liu and Maes [36] pointed out a lack of privacy awareness and found a large number of social network profiles in which people described themselves with a rich vocabulary in terms of their passions and interests. Krishnamurthy and Wills [31] discussed the problem of leakage of personally identifiable information and how it can be misused by third parties (Narayanan and Shmatikov [46]). Squicciarini et al. [53] introduced a novel collective privacy mechanism for better managing shared content between users. Fang and LeFevre [14] focused on helping users to understand simple privacy settings, but did not consider additional problems such as attribute inference (Zheleva and Getoor [64]) or shared data ownership (Squicciarini et al. [53]). Zheleva and Getoor [64] showed how an adversary can exploit an online social network with a mixture of public and private user profiles to predict the private attributes of users. Baden et al. [9] presented a framework where users dictate who may access their information based on public private encryption decryption algorithms. Social trust depends on many factors that cannot be easily modeled in a computational system. Many different versions of definition of trust are proposed in the literature (Deutsch [13], Sztompka [54], Mui et al. [44], Olmedilla et al. [47], Grandison and Sloman [20], Artz and Gil [6]). A highly cited thesis on trust computation (Marsh [41]) provides theoretical perspectives of modeling trust, but its complex nature makes it very difficult to apply, especially to social networks (Golbeck and Hendler [18]). Trust between any two people is observed to be affected by many factors including past experiences, opinions expressed and actions taken, contributions to spreading rumors, influence by others opinions, and motives to gain something extra. Another important aspect of trust is the trustworthiness of user-generated content. Moturu and Liu [43] provided an intuitive scoring measure to quantify the trustworthiness of health-related user-generated content in social media. 3. Illustrative Examples of Mining Social Media In this section, we present some examples in our research to illustrate how to mine social media data to address novel research issues. The first example is about assessing user vulnerability on a social networking sites to maintain user privacy. The second example explores the importance of social and historical ties on a location-based social network. The third example introduces a method that can take advantage of multifaceted trust for predictive analytics.
8 8 Tutorials in Operations Research, c 2012 INFORMS 3.1. Assessing User Vulnerability on a Social Networking Site (Gundecha et al. [23]) Attribute-value data are a principal data form in social media. Attributes available for every user on a social networking site can be categorized into two major types: individual attributes and community attributes. Individual attributes are those attributes that contain individual user information. Individual attributes include personal information such as gender, birth date, phone number, home address, etc. Community attributes are those attributes that contain information about the friends of a user. Community attributes include friends that are traceable from a user s profile (i.e., a user s friends list), tagged pictures, and wall interactions. Using the privacy and security settings of a profile, a user can control the visibility of most individual attributes but has limited control over the visibility of most community attributes. For example, Facebook users these days can control photo tagging and the sharing of their friend list with the public but still can not control friends sharing their friend lists or uploading photos of them from their profiles to the public. Our earlier work (Gundecha et al. [23]) used a large-scale Facebook data set to assess attribute visibility, which can be used to obtain general behavior of Facebook users. For example, Facebook users do not usually disclose their mobile phone number. Hence, users that do disclose phone numbers have a propensity to vulnerability because they disclose more sensitive information in their profiles. A large portion of users are either not careful or not aware of consequences of their actions on the privacy information of their friends. Thus, protecting community attributes is important in protecting user privacy. A novel mechanism was proposed by Gundecha et al. [23] to enable users to protect against vulnerability. Often users on a social networking site are unaware that they could pose a threat to their friends because of their vulnerability. Gundecha et al. [23] showed that it is feasible to measure a user s vulnerability based on three factors: (1) the user s privacy settings that can reveal personal information, (2) the user s action on a social networking site that can expose his or her friends personal information, and (3) friends actions on a social networking site that can reveal the user s personal information. Based on these factors, Gundecha et al. [23] formally presented one of the earliest models for vulnerability reduction. They proposed a four-step procedure to estimate user vulnerability: (1) estimate risk to privacy due to individual attributes (referred to as the I-index), (2) estimate risk to privacy due to community attributes (referred to as the C-index), (3) estimate the visibility of a user based on the I-index and C-index (referred to as the P -index), and (4) estimate the user vulnerability based on the P -indexes of a user and his one-hop friends (referred to as the V -index). Figures 1(a) and 1(b) show the relationship between the I-index and C-index as well as that between the P -index and V -index for 100,000 randomly chosen Facebook users. Note that users are sorted in ascending order of their I-indexes and P -indexes. The x-axis and y-axis show users and their index values, respectively. A user s vulnerable friend is defined as a friend whose unfriending will lower the V -index score of a user. This definition of a vulnerable friend can be generalized to multiple vulnerable friends. Figure 2 shows a comparison of the V -index values of each user before and after the unfriending of the user s k most vulnerable friends. For each graph in Figure 2, the x-axis and y-axis indicate users and their V -index values, respectively. Without loss of generality, we sorted all users, in the ascending order based on the existing V -index values, before we plotted the graphs in Figure 2. We ran the experiment on 300,000 randomly selected users of the Facebook data set. Figure 2 shows the performance comparison of V -index values for each user before (+) and after ( ) unfriending the k most vulnerable friends. We ran the experiments for different values of k including 1, 2, 10, and 50. As expected, vulnerability decreases consistently as the value of k increases, as seen in Figures 2(a) 2(d).
9 Tutorials in Operations Research, c 2012 INFORMS 9 Figure 1. Relationship among index values for each user. (a) I-index and C-index for each user (b) P-index and V-index for each user 1.0 C-index I-index 1.0 V-index P-index Index values Index values Users Users Exploring Social Historical Ties on Location-Based Social Networks (Gao et al. [16]) LBSNs have been a popular form of social media in recent years. They provide locationrelated services that allow users to check in at geographical locations and share such experience with their friends. Millions of check-in records in LBSNs contain rich social and geographical information and provide a unique opportunity for researchers to study users Figure 2. Performance comparison of V -index values for each user before (+) and after ( ) unfriending the k most vulnerable friends from his or her social network. (a) Most vulnerable (b) 2 most vulnerable M1-index V-index M2-index V-index Index values Index values Users 10 5 Users 10 5 (c) 10 most vulnerable (d) 50 most vulnerable M10-index V-index M50-index V-index Index values Index values Users Users
10 10 Tutorials in Operations Research, c 2012 INFORMS social behavior from a spatial temporal aspect, which in turn enables a variety of services including place advertisement or recommendation, traffic forecasting, and disaster relief. To understand a user s check-in behavior, it is inevitable to perform a historical analysis of users. It is because the historical check-ins provide rich information about a user s interests and hints about when and where a particular user would like to go. In addition, social correlation theory suggests to consider users social ties, because human movement is usually affected by their social events, such as visiting friends, going out with colleagues, and so on. These two relationship ties can shape the user s check-in experience on LBSNs, and each tie gives rise to a different probability of check-in activity, which indicates that people in different spatial temporal social circles have different interactions. The historical ties of a user s check-in behavior have two properties on LBSNs. First, a user s check-in history approximately follows a power-law distribution; i.e., a user goes to a few places many times and to many places a few times. Second, the historical ties have a short-term effect. Taking advantage of the similarity between language modeling and location-based social network mining, the work of Gao et al. [16] introduced the Pitman Yor process to location-based social networks to model the historical ties of a user i for his checkin behavior c n+1 = l at time (n + 1) and location l, specifically, the power-law distribution and short-term effect of historical ties, denoted as historical model (HM) as shown below: P i H(c n+1 = l) = P i, i HP Y (c n+1 = l u, t ul, d u, r u, t u ), where u, t ul, d u, r u, and t u are parameters. A social historical model (SHM) is proposed to explore user i s check-in behavior integrating both of the social and historical effects: P i SH(c n+1 = l) = ηp i H(c n+1 = l) + (1 η)p i S(c n+1 = l), where P i S(c n+1 = l) = u j N (u i) sim(u i, u j )P i, j HP Y (c n+1 = l), where N indicates user i s set of neighbors. The experiments with location prediction on a real-world LBSN compare the proposed methods (HM and SHM) with some baseline methods. The results are plotted in Figure 3, demonstrating that the proposed methods best model users check-ins in terms of location prediction; in other words, social and historical ties can help location prediction. This work finds that a user s friends can influence his next location because users that have shared friends tend to go to similar locations than those without. The power-law property and short-term effect are observed in historical ties; thus, a historical model is introduced to capture these properties. The experimental results on location prediction demonstrate that the proposed approach is suitable in capturing a user s check-in property and outperforms current models Discerning Multifaceted Trust in a Connected World (Tang et al. [57]) The issue of trust has attracted increasing attention from the community of social media research. Trust, as a social concept, naturally has multiple facets, indicating multiple and heterogeneous trust relationships between users. Here is a multifaceted trust example from Epinions. Figure 4(a) shows single trust relationships between user 1 and his 20 friends. Here, we can see that user 7 is the more trustable for user 1. Figures 4(b) and 4(c) show their multifaceted trust relationships in the categories home and garden and restaurants, respectively. For the category home and garden, user 7 is not necessary the most trusted
11 Tutorials in Operations Research, c 2012 INFORMS 11 Figure 3. The performance comparison of prediction models demonstrates that by considering social and historical ties, the proposed models can help location prediction. Prediction accuracy MFC MFT Order-1 Order-2 HM SHM Fraction of training set Note. MFC, most frequent check-in model; MFT, most frequent time model; Order-1, order-1 Markov model; Order-2, order-2 Markov model. friend of user 1. This shows that trust relationships in different categories vary. Thus, people trust others differently in different facets. There are two challenges to study in obtaining multifaceted trust between users: first, the representation of multiple and heterogeneous trust relationships between users, and second, estimating the strength of multifaceted trust. Traditionally, trust is represented by an adjacency matrix. However, this cannot capture the multifaceted trust relations. Tang et al. [57] developed a new algorithm, mtrust, that extends a matrix representation to a tensor representation, adding an extra dimension for facet description. Previous work observed a strong correlation between trust and user similarity in the context of rating systems. Therefore, it is reasonable to embed trust strength inference in rating prediction. Thus, to evaluate the usefulness of multifaced trust, this work embeds the multifaceted trust inference in the framework of rating prediction. Interesting findings from the experiments are that (1) more than 20% of reciprocal links are heterogeneous, (2) more than 14% transitive trust relations are heterogeneous, and (3) more Figure 4. Single trust and multifaceted trust relationships of one user in Epinions. (a) Single trust (b) Trust in home and garden (c) Trust in restaurants Pajek Pajek Pajek Note. The thickness of a line indicates the level of trust.
12 12 Tutorials in Operations Research, c 2012 INFORMS than 11% of cocitation trust relations are heterogeneous. With these findings, mtrust can be applied to many online tasks such as improving rating prediction, enabling facet-sensitive ranking, and making status theory applicable to reciprocal links. 4. Employing Social Media in Real-World Applications Social media has been increasingly used in a wide range of domains; examples include political campaigns (e.g., presidential elections), mass movements (e.g., organizing Occupy Wall Street movements, Arab Spring), as well as disaster and crisis response and relief coordination. In this section, we show how social media mining is used for HADR. Government and nongovernmental organizations (NGOs) have been faced with challenges to effectively respond to crises such as natural disasters (e.g., tsunamis, hurricanes, earthquakes). Providing efficient relief to the victims is a primary focus in saving lives, minimizing further losses in the aftermath, and accelerating recovery. Crowdsourcing tools via social media such as Twitter, Ushahidi, and Sahana have proven useful in gathering information during and after a crisis. However, they are designed to go beyond crowdsourcing. Additional capabilities are required for response coordination, secured collaboration, and trust building among relief organizations. Goolsby [19] pointed out the need for such a social media system and described the effort to build it to allow different organizations to share crowdsourced and groupsourced information, and analyze and visualize the processed information for intelligent decision making. In this section, we describe two social media prototype systems designed to facilitate efficient collaboration among disparate organizations for effective and coordinated responses to crises ASU Coordination Tracker (Gao et al. [17]) When natural disasters occur, the international community would join forces to provide disaster relief and humanitarian assistance. Some prominent examples include the Haiti earthquake and the subsequent cholera outbreak, and the devastating earthquake and tsunami in Japan. Social media has revolutionized the use of traditional media and played an important role in these events as an information collector and disseminator, and as a communication and collaboration tool. As one of the most important functions of social media, crowdsourcing is a collaborative information sharing mechanism based on the principle of collective wisdom. Wikipedia 10 is a perfect example of crowdsourcing, where people collectively publish information on various topics. Crowdsourcing is capable of leveraging participatory social media services and tools to collect information, and it allows crowds to participate in various HADR tasks. Its integration with crisis maps has been a very effective crowdsourcing application in HADR efforts. One of the earliest such systems was Alive in Afghanistan, 11 where people in Afghanistan submitted their reports on accidents and terrorist attacks across Afghanistan. Although crowdsourcing applications can provide accurate and timely information about a crisis for decision making, current crowdsourcing applications still fall short in supporting disaster relief efforts (Gao et al. [15]). Most importantly, current applications do not provide a common mechanism specifically designed for collaboration and coordination between disparate relief organizations. For example, relief organizations that work independently can cause conflicts and complicate the relief efforts. If independent organizations duplicate a response, it will draw resources away from the other areas in need that could use the duplicated supplies. It can also result in delayed response to other disaster areas. Furthermore, because of the noisy and chaotic nature of crowdsourced data, current crowdsourcing applications cannot provide readily useful information for disaster relief efforts (accessed March 2012) (accessed March 2012).
13 Tutorials in Operations Research, c 2012 INFORMS 13 To address these shortfalls of crowdsourcing to a certain extent, an online coordination system, the ASU Coordination Tracker (ACT), was devised. The ACT is an event response coordination system with the primary goal of facilitating multiorganization response (military, governments, NGOs, etc.) to an event such as a disaster and providing relief organizations the means for better collaboration and coordination during a crisis. It is a speedy and effective approach with easy, open communication and streamlined coordination. The major features and advantages of the ACT are summarized below: leveraging crowdsourcing information to provide the means for a groupsourcing response for organizations to assist persons affected by a crisis effectively, developing novel strategies to analyze the requests and maximize the coordinate efforts of crowdsourced data for disaster relief, visualizing the requests to give a global view of the request distribution to facilitate responders, and increasing the coordination efficiency by optimally responding to requests. The ACT consists of five functional modules: request collection, request analysis, response, coordination, and situation awareness. Raw requests from users are collected via crowdsourcing and groupsourcing methods. The request analysis module takes advantage of both data mining technology and expert knowledge to iteratively capture the essential content of raw requests. Based on the demand and disaster background, raw requests are analyzed and classified into various categories. Essential contents extracted from the raw data are then stored in a requests pool. The system visualizes the requests pool on its crisis map, with its selective visibility bonded to the access levels of users (organizations). The response module is designed to allow relief organizations to contribute, receive, and evaluate different response options and plan response actions. Relief organizations are able to respond to requests through the crisis map directly. The coordination model uses the interagency concept (Goolsby [19]) to avoid response conflicts while maintaining centralized control. The ACT displays every available request on the crisis map. Each request is in one of the following four states: available, in process, in delivery, or delivered. To avoid conflicts, relief organizations are not able to respond to requests that are being fulfilled by another organization. A statistics module runs in the background to help track relief progress for situation awareness, relief strategy adjustments, and further decision making. The ACT is a developing system that enables coordination among organizations during disaster relief. We are investing approaches to improve collaboration efficiency and provide differential security to relief organizations and decision makers. The disaster relief ASU Crisis Response Game (Abbasi et al. [1]) has demonstrated that by both leveraging crowdsourcing information and providing means for a groupsourcing response, organizations can effectively assist victims TweetTracker (Kumar et al. [32]) TweetTracker 12 is a Twitter-based analytic and visualization tool. The focus of the tool is to help HADR relief organizations to acquire situational awareness during disasters and emergencies to aid disaster relief efforts (Kumar et al. [32]). New social media platforms, such as Twitter microblogs, demonstrate their value and cability to provide information that is not easily attainable from traditional media. For example, during the Mumbai blasts of 2011, 13 firsthand information from the affected region was available on Twitter moments after the blast. TweetTracker is designed to help to track, analyze, review, and monitor tweets. This is achieved through near-real-time tracking of tweets with specific keywords/hashtags and tweets generated from the region affected by the crisis. The tool supports monitoring and 12 (accessed March 2012) (accessed March 2012).
14 14 Tutorials in Operations Research, c 2012 INFORMS analysis of the collected tweets via real-time trending, data reduction, historical review, and integrated data mining techniques. TweetTracker consists of three main components: (1) a Twitter stream reader, (2) a data storage module, and (3) a data mining and visualization module. The Twitter stream reader is a data collection module that continually crawls tweets through the Twitter streaming API (Application Programming Interface). 14 Tweets are filtered based on user-specified keywords, hashtags, and geolocations. The data storage module is responsible for storing and indexing the collected tweets into a relational database for use by the visualization module. The data mining and visualization module is a Web-based user interface to the collected tweets and a means to analyze the collected tweets. It provides geospatial visualization of tweets related to a particular event on a map, summarizes the tweets, and visualizes the trending keywords in the form of a word cloud, and it can identify popular resources (URLs) and users mentioned in the tweets. The tool also includes built-in language translation support for monitoring of multilingual tweets. TweetTracker has been used in tracking, visualizing, and analyzing activities including the Arab Spring movement, the Occupy Wall Street movement, and various natural disasters such as earthquakes and cholera outbreaks. 5. Summary Valuable information is hidden in vast amounts of social media data, presenting ample opportunities social media mining to discover actionable knowledge that is otherwise difficult to find. Social media data are vast, noisy, distributed, unstructured, and dynamic, which poses novel challenges for data mining. In this tutorial, we offer a brief introduction to mining social media, use illustrative examples to show that burgeoning social media mining is spearheading the social media research, and demonstrate its invaluable contributions to real-world applications. As a main type of big data, social media is finding its many innovative uses, such as political campaigns, job applications, business promotion and networking, and customer services, and using and mining social media is reshaping business models, accelerating viral marketing, and enabling the rapid growth of various grassroots communities. It also helps in trend analysis and sales prediction. Social media data will continue their rapid growth in the foreseeable future. We are faced with an increasing demand for new algorithms and social media mining tools. Existing preliminary success in social media mining research efforts convincingly demonstrates the promising future of the emerging social media mining community and will help to expand research and development and explore online and off-line human behavior and interaction patterns. Acknowledgments The authors thank Huiji Gao, Shamanth Kumar, Jiliang Tang, and DMML members for their assistance and feedback in preparing this tutorial. Some projects described in this brief introductory survey were sponsored by the Office of Naval Research [ONR N ]; the Army Research Office [ARO ]; and the National Science Foundation [Grant ]. References [1] M. A. Abbasi, S. Kumar, J. A. Andrade Filho, and H. Liu. Lessons learned in using social media for disaster relief ASU Crisis Response Game. Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer-Verlag, Berlin, , [2] N. Agarwal and H. Liu. Modeling and Data Mining in Blogosphere. Morgan & Claypool Publishers, San Rafael, CA, (accessed March 2012).
15 Tutorials in Operations Research, c 2012 INFORMS 15 [3] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. Proceedings of the International Conference on Web Search and Web Data Mining. Association for Computing Machinery, New York, , [4] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Modeling blogger influence in a community. Social Network Analysis and Mining 2(2): , [5] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences of the United States of America 106(51):21544, [6] D. Artz and Y. Gil. A survey of trust in computer science and the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web 5(2):58 71, [7] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, , [8] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, 44 54, [9] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin. Persona: An online social network with user-defined privacy. ACM SIGCOMM Computer Communication Review 39(4): , [10] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and Its Applications. Charles Griffin, High Wycombe, UK, [11] E. Berger. Dynamic monopolies of constant size. Journal of Combinatorial Theory, Series B 83(2): , [12] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on Twitter: Human, bot, or cyborg? Proceedings of the 26th Annual Computer Security Applications Conference. Association for Computing Machinery, New York, 21 30, [13] M. Deutsch. Cooperation and trust: Some theoretical notes. Nebraska Symposium on Motivation. University of Nebraska Press, Lincoln, [14] L. Fang and K. LeFevre. Privacy wizards for social networking sites. Proceedings of the 19th International Conference on World Wide Web. Association for Computing Machinery, New York, , [15] H. Gao, G. Barbier, and R. Goolsby. Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intelligent Systems 26(3):10 14, [16] H. Gao, J. Tang, and H. Liu. Exploring social-historical ties on location-based social networks. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Artificial Intelligence, Palo Alto, CA, [17] H. Gao, X. Wang, G. Barbier, and H. Liu. Promoting coordination for disaster relief: From crowdsourcing to coordination. Proceedings of the 4th Conference on Social Computing, Behavioral-Cultural Modeling and Prediction. Springer-Verlag, Berlin, , [18] J. Golbeck and J. Hendler. Inferring binary trust relationships in web-based social networks. ACM Transactions on Internet Technology 6(4): , [19] R. Goolsby. Social media as crisis platform: The future of community maps/crisis maps. ACM Transactions on Intelligent Systems and Technology 1(1):1 11, [20] T. Grandison and M. Sloman. A survey of trust in Internet applications. IEEE Communications Surveys & Tutorials 3(4):2 16, [21] M. Granovetter. Threshold models of collective behavior. American Journal of Sociology 83(6): , [22] R. Gross and A. Acquisti. Information revelation and privacy in online social networks. Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society. Association for Computing Machinery, New York, 71 80, [23] P. Gundecha, G. Barbier, and H. Liu. Exploiting vulnerability to secure user privacy on a social networking site. The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, 2011.
16 16 Tutorials in Operations Research, c 2012 INFORMS [24] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, [25] M. Hu and B. Liu. Mining and summarizing customer reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, , [26] N. Jindal and B. Liu. Identifying comparative sentences in text documents. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, , [27] N. Jindal and B. Liu. Opinion spam and analysis. Proceedings of the International Conference on Web Search and Web Data Mining. Association for Computing Machinery, New York, , [28] A. M. Kaplan and M. Haenlein. Users of the world, unite! The challenges and opportunities of social media. Business Horizons 53(1):59 68, [29] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, , [30] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, , [31] B. Krishnamurthy and C. E. Wills. On the leakage of personally identifiable information via online social networks. ACM SIGCOMM Computer Communication Review 40(1): , [32] S. Kumar, R. Zafarani, and H. Liu. Understanding user migration patterns across social media. Twenty-Fifth International Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, Palo Alto, CA, [33] T. La Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. Proceedings of the 19th International Conference on World Wide Web. Association for Computing Machinery, New York, , [34] P. F. Lazarsfeld and R. K. Merton. Friendship as a social process: A substantive and methodological analysis. Freedom and Control in Modern Society 18:18 66, [35] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Processing. CRC Press, Boca Raton, FL, , [36] H. Liu and P. Maes. InterestMap: Harvesting social network profiles for recommendations. Workshop: Beyond Personalization, San Diego, [37] H. Liu and H. Motoda. Computational Methods of Feature Selection. Chapman & Hall, Boca Raton, FL, [38] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, , [39] M. W. Macy. Chains of cooperation: Threshold effects in collective action. American Sociological Review 56(6): , [40] V. Mahajan, E. Muller, and F. M. Bass. New product diffusion models in marketing: A review and directions for research. Journal of Marketing 54(1):1 26, [41] S. P. Marsh. Formalising trust as a computational concept. Ph.D. thesis, Deptartment of Computing Science and Mathematics, University of Stirling, Stirling, UK, [42] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology 27: , [43] S. T. Moturu and H. Liu. Quantifying the trustworthiness of social media content. Distributed and Parallel Databases 29(3): , [44] L. Mui, M. Mohtashemi, and A. Halberstadt. A computational model of trust and reputation for E-businesses. Proceedings of the 35th Annual Hawaii Conference on System Sciences (HICSS 02). IEEE Computer Society, Washington, DC, , 2002.
17 Tutorials in Operations Research, c 2012 INFORMS 17 [45] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. Proceedings of the 2008 IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC, , [46] A. Narayanan and V. Shmatikov. De-anonymizing social networks. Proceedings of the 2009 IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC, , [47] D. Olmedilla, O. Rana, B. Matthews, and W. Nejdl. Security and trust issues in semantic grids. Proceedings of the Dagsthul Seminar, Semantic Grid: The Convergence of Technologies 5271: , [48] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends R in Information Retrieval 2(1 2):1 135, [49] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Vol. 10. Association for Computational Linguistics, Stroudsburg, PA, 79 86, [50] A. M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, , [51] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, , [52] D. Rosenblum. What anyone can know: The privacy risks of social networking sites. IEEE Security and Privacy 5(3):40 49, [53] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy management in social networks. Proceedings of the 18th International Conference on World Wide Web. Association for Computing Machinery, New York, , [54] P. Sztompka. Trust: A Sociological Theory. Cambridge University Press, Cambridge, UK, [55] P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley, Boston, [56] L. Tang and H. Liu. Community Detection and Mining in Social Media, Vol. 2. Morgan & Claypool Publishers, San Rafael, CA, [57] J. Tang, H. Gao, and H. Liu. mtrust: Discerning multi-faceted trust in a connected world. Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, , [58] P. D. Turney. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, , [59] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, [60] G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A practical attack to de-anonymize social network users. Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC, , [61] S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd. Detecting spam in a Twitter network. First Monday 15(1):1 4, [62] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, , [63] Z. A. Zhao and H. Liu. Spectral Feature Selection for Data Mining. Chapman & Hall/CRC Press, Virginia Beach, VA, [64] E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. Proceedings of the 18th International Conference on World Wide Web. Association for Computing Machinery, New York, , 2009.
CHAPTER 2 Social Media as an Emerging E-Marketing Tool
Targeted Product Promotion Using Firefly Algorithm On Social Networks CHAPTER 2 Social Media as an Emerging E-Marketing Tool Social media has emerged as a common means of connecting and communication with
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Measuring User Credibility in Social Media
Measuring User Credibility in Social Media Mohammad-Ali Abbasi and Huan Liu Computer Science and Engineering, Arizona State University [email protected],[email protected] Abstract. People increasingly
Capturing Meaningful Competitive Intelligence from the Social Media Movement
Capturing Meaningful Competitive Intelligence from the Social Media Movement Social media has evolved from a creative marketing medium and networking resource to a goldmine for robust competitive intelligence
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Guide to Social Marketing for Tourist Destinations & Travel Agents Introduction
Guide to Social Marketing for Tourist Destinations & Travel Agents Introduction There has been a dramatic increase in the number of people who are becoming self-reliant travelers; using online communities,
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
SOCIAL MEDIA MARKETING 101. By Debbie Laskey, MBA
SOCIAL MEDIA MARKETING 101 By Debbie Laskey, MBA Marketing, Strategic Branding, Communications & Website Consultant December 2009 What is social media? According to Wikipedia, the term social media has
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS
Social Networks and Social Media
Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich
Data Analysis on Location-Based Social Networks
Data Analysis on Location-Based Social Networks Huiji Gao and Huan Liu Abstract The rapid growth of location-based social networks (LBSNs) has greatly enriched people s urban experience through social
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Introduction. Chapter 1
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
Social Media Glossary of Terms
Social Media Glossary of Terms A Adsense: Google's pay-per-click, context-relevant program available to blog and web publishers as a way to create revenue. Adwords: advertiser program that populates the
How Social Media will Change the Future of Banking Services
DOI: 10.7763/IPEDR. 2013. V65. 1 How Social Media will Change the Future of Banking Services Iwa Kuchciak 1 1 University of Lodz Abstract. Parallel with the growth of importance of social media there is
WHITE PAPER. Social media analytics in the insurance industry
WHITE PAPER Social media analytics in the insurance industry Introduction Insurance is a high involvement product, as it is an expense. Consumers obtain information about insurance from advertisements,
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
SOCIAL MEDIA OPTIMIZATION
SOCIAL MEDIA OPTIMIZATION Proxy1Media is a Full-Service Internet Marketing, Web Site Design, Interactive Media & Search-Engine Marketing Company in Boca Raton, Florida. We specialize in On-Line Advertising
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Online Reputation Management Proposal (ORM)
Executive Summary Online Reputation Management Proposal (ORM) CitiReview s technique involves a mix of the most powerful social media campaign strategies in the industry. Our focus is getting and dividing
Data Isn't Everything
June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,
Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
Use of Social Media in Natural Disaster Management
Use of Social Media in Natural Disaster Management Dimiter Velev 1 + and Plamena Zlateva 2 1 University of National and World Economy, Sofia, Bulgaria 2 Institute of System Engineering and Robotics - BAS,
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
DIGITAL STRATEGY AND TACTICS FOR BRAND REPUTATION MANAGEMENT
FOR BRAND REPUTATION MANAGEMENT Do you know what your customers are saying about your brand in the online world? How about your competitors? What about your ex-employees? The Internet and many Web 2.0
Contact Recommendations from Aggegrated On-Line Activity
Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
WSI White Paper. Prepared by: Francois Muscat Search Engine Optimization Expert, WSI
Make Sure Your Company is Visible on Google with Search Engine Optimization Your Guide to Lead Generation in Tough Economic Times WSI White Paper Prepared by: Francois Muscat Search Engine Optimization
Social Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
Online Reputation Management Services
Online Reputation Management Services Potential customers change purchase decisions when they see bad reviews, posts and comments online which can spread in various channels such as in search engine results
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
In every ear it spread,
In every ear it spread, on every ton gue it grew, and then the whole world knew, there was nothing you could do. Online Reputation Management The best part of working in this company the people and the
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
How To Listen To Social Media
WHITE PAPER Turning Insight Into Action The Journey to Social Media Intelligence Turning Insight Into Action The Journey to Social Media Intelligence From Data to Decisions Social media generates an enormous
Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo
Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
JamiQ Social Media Monitoring Software
JamiQ Social Media Monitoring Software JamiQ's multilingual social media monitoring software helps businesses listen, measure, and gain insights from conversations taking place online. JamiQ makes cutting-edge
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Social media has recently played a critical
C Y B E R - P H Y S I C A L - S O C I A L S Y S T E M S Editor: Daniel Zeng, University of Arizona, [email protected] Harnessing the Crowdsourcing Power of Social Media for Disaster Relief Huiji Gao
Social Media Glossary of Terms For Small Business Owners
Social Media Glossary of Terms For Small Business Owners Introduction As a small business, reaching your audience efficiently and cost-effectively way is critical to your success. Social media platforms
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
An Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits
Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits Social Media www.constantcontact.com 1-866-876-8464 Insight provided by 2011 Constant Contact, Inc. 11-2168 What is Twitter?
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE
CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE Michael Diederich, Microsoft CMG Research & Insights Introduction The rise of social media platforms like Facebook and Twitter has created new
A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics
contents A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics Abstract... 2 Need of Social Content Analytics... 3 Social Media Content Analytics... 4 Inferences
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
ICT Perspectives on Big Data: Well Sorted Materials
ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Oracle Big Data Discovery The Visual Face of Hadoop
Disclaimer: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development,
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Social Media Guidelines for Best Practice
Social Media Guidelines for Best Practice September 2009 Contents: Listen and research the social media environment Page 3 & 4 Set the parameters before you start Page 4 Getting Started Page 5-6 In Summary
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
How To Create An Insight Analysis For Cyber Security
IBM i2 Enterprise Insight Analysis for Cyber Analysis Protect your organization with cyber intelligence Highlights Quickly identify threats, threat actors and hidden connections with multidimensional analytics
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Social Media & Internet Marketing :: Menu of Services
Social Media & Internet Marketing :: Menu of Services Social Networking Setup & Manage Company profiles on major social networks; Facebook, Linkedin and Twitter (includes custom background) see info below
What You Need to Know Before Distributing Your Infographic
What You Need to Know Before Distributing Your Infographic Improve your audience outreach efforts by learning why and how to use the best social, owned and earned platforms available. Targeting specific
Formal Methods for Preserving Privacy for Big Data Extraction Software
Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
IBM Social Media Analytics
IBM Analyze social media data to improve business outcomes Highlights Grow your business by understanding consumer sentiment and optimizing marketing campaigns. Make better decisions and strategies across
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Social Media Marketing for Small Business Demystified
Social Media Marketing for Small Business Demystified General Overview, Strategies and Tools for Small Business Marketing on Social Media [Learn How to Effectively Use the Social Media for Making Connections
