Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership

Size: px
Start display at page:

Download "Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership"

Transcription

1 Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership Adiya Abisheva ETH Zürich Venkata Rama Kiran Garimella Qatar Computing Research Institute David Garcia ETH Zürich Ingmar Weber Qatar Computing Research Institute ABSTRACT By combining multiple social media datasets, it is possible to gain insight into each dataset that goes beyond what could be obtained with either individually. In this paper we combine user-centric data from Twitter with video-centric data from YouTube to build a rich picture of who watches and shares what on YouTube. We study 87K Twitter users, 5.6 million YouTube videos and 15 million video sharing events from user-, video- and sharing-eventcentric perspectives. We show that features of Twitter users correlate with YouTube features and sharing-related features. For example, urban users are quicker to share than rural users. We find a superlinear relationship between initial Twitter shares and the final amounts of views. We discover that Twitter activity metrics play more role in video popularity than mere amount of followers. We also reveal the existence of correlated behavior concerning the time between video creation and sharing within certain timescales, showing the time onset for a coherent response, and the time limit after which collective responses are extremely unlikely. Response times depend on the category of the video, suggesting Twitter video sharing is highly dependent on the video content. To the best of our knowledge, this is the first large-scale study combining YouTube and Twitter data, and it reveals novel, detailed insights into who watches (and shares) what on YouTube, and when. 1. INTRODUCTION On July 11, tweeted: so many activities it is making my head spin! haha sharing a link to a short YouTube movie clip. In one day, the video received more than 100,000 views, and its owner commented: So I checked my today to find 500 new mail... WTF I thought... 5 mins later I discover that Justin Bieber has tweeted this video.... The viewers of that video came from the 40 million followers that Justin Bieber has in Twitter, including large amounts of pop-loving teenagers that retweeted the video link more than 800 times in the following days. The above example illustrates how the combination of Twitter and YouTube data provide insights on who watches what on YouTube Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from WSDM 14, February 24 28, 2014, New York, New York, USA. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM /14/02...$ and when. In this article we combine large datasets from both online communities, aiming at a descriptive analysis of the demographics and behavioral features of YouTube viewership through Twitter video shares. In our analysis, the who refers to the identity of Twitter users, as displayed on their public profile. We quantify this identity in three facets: a) demographic variables such as gender and location, b) social metrics that include reputation and impact metrics in the Twitter follower network, and c) personal interests and political alignment inferred from profile descriptions and followed accounts. The what refers to features of the videos, including i) the YouTube category, and ii) the popularity of videos in terms of views or likes. The when is the time lapsed between the creation of a video and its sharing in Twitter, measuring the time component of individual and collective reaction patterns to YouTube videos. With a combined dataset of Twitter data and YouTube videos we can answer questions about the interaction between both communities. First, we explore the purpose of social sharing, distinguishing regular and promotional Twitter accounts linked to a particular YouTube channel. We then analyze to which extent the content of the videos watched by a user is similar to their interests on Twitter. Using features extracted from Twitter, we are able to quantify factors such as social sharing and influence and infer their effect on the videos consumed on YouTube. We also look at the role of political alignment in YouTube video sharing, comparing the shared topics and reaction patterns of individuals depending on their political activity on Twitter. We analyze the times between video creation and social sharing, looking for factors that mediate the speed of video sharing. We find the demographics of users that share videos earlier than the rest, and compare how different categories elicit faster or slower reactions in Twitter. Finally, we explore the relation between the early Twitter shares of a video and its final popularity. To do so, we designed a model that includes social impact and reputation metrics of the early watchers of the video, providing early forecasts of a video s ultimate popularity. 2. RELATED WORK Since we answer who?, what? and when?, we describe related work done on Twitter profiles and online demographics, YouTube viewership and content and temporal behaviour patterns. 2.1 Online Demographics Related work on who does what in Web search has been done in Weber and Jaimes [38] where authors analyze query logs of 2.3 million users form a web search US engine. Even though our

2 work performs analysis on Twitter and YouTube users rather than Web search users, methodology used in previous study is of high relevance for our research. More closely related work on Twitter demographics was performed in Mislove et al. [26] where authors investigate whether Twitter users are a representative sample of society. By using (optionally) self-reported and publicly visible data of Twitter users, authors compared demographics of Twitter US users to the US population along three axes. On the geographical dimension, findings showed that Twitter users are overrepresented in highly populated US counties and underrepresented in sparsely populated regions due to different patterns of adoption of social media across regions. Across gender, the male Twitter population is greater than female especially among early Twitter adopters, but male bias decreases as Twitter evolves. On race/ethnicity authors show the distribution is highly geographically-dependent. Another study on demographics by Goel et al. [16] shows that user demographics (age, gender, race, income etc.) can be inferred from Web browsing histories. Finally, Kulshrestha et al. [22] investigate the role of offline geography in Twitter and conclude that it has a significant role in social interactions on Twitter with more tweets and links exchanged across national boundaries. 2.2 Research on Twitter Data Apart from demographics, Twitter has also been studied from other perspectives: prediction of trends/hashtags [2, 34, 19]; notions of influence in Twitter [6] and using Twitter predictive data for elections and discovering political alignment of users [8, 9]. Asur et al. [2] studied trending topics/hashtags and discovered that the content of a tweet and retweeting activity rather than user attributes such as influence, number of followers and frequency of posting are the main drivers for spotting the trend and keeping it alive. In Wang and Huberman [34] a model for attention growth and persistence of trends is presented and is validated on trending topics in Twitter. In another work on trending topics, hashtags in Twitter can be clustered according to the temporal usage patterns of the hashtag: before, during and after peak of its popularity. Furthermore, the class of the hashtag correlates with social semantics of content associated with the hashtag (Lehmann et al. [24]). In a study on differences of search activity of trending topics in the Web and Twitter, Kairam et al. [19] reveal that information-seeking and information-sharing activity around trending events follows similar temporal dynamics, but social media leads Web search activity by 4.3 hours on average. More generalized study on differences between Web and Twitter search by Teevan et al. [32] found that timely and social information are primary drivers for searching on Twitter, compared to more navigational search on the Web; Twitter search is more used to monitor new content, while search on the Web is performed for developing and learning about a topic. Another perspective is the notion of influence in Twitter using followers, retweets and mentions studied by Cha et al. [6] with main finding that having a lot of followers does not necessarily mean having a high influence. Another line of work uses Twitter to monitor political opinions, increase political mobilization, and possibly predict elections results. Conover et al. [9] present several methods to discover political alignment of Twitter users by analysing the network of political retweets and hashtags usage. In subsequent work [8], authors go beyond discovering political groups in Twitter, and analyse interaction dynamics of politically aligned subcommunities. Their findings show that right-leaning Twitter users produce more political content, spend a greater proportion of their time for political conversation and have more tightly interconnected social structure which leverages broad and fast spread of information. 2.3 YouTube Video Consumption Ulges et al. [33] use YouTube concepts to predict demographic profile of viewers and also try to use demographics estimated from views statistics to predict the concepts of a video. They show that the use of demographic features improves the quality of prediction. Concerning YouTube video views, researchers have analyzed time series [11], and predicted the final views count based on properties of growth on YouTube [31]. For instance, Crane and Sornette [11] perform analysis of collective responses to YouTube videos through their time series of views. Among the classes, the most usual were videos that have a fast decaying amount of views, receiving negligible amounts of views soon after their creation. Laine et al. [23] highlighted the role of exogenous factors (such as interest groups) in the activity of YouTube viewers and Qiu et al. [29] suggested two different mechanisms that drive YouTube viewership: popularity and quality filtering. On popularity of videos in YouTube, Figueiredo et al. [12] found copyright videos gain 90% of their views early in lifetime compared to top listed YouTube or randomly chosen videos; top listed videos show quality popularity dynamics pattern opposed to copyright and random videos exhibiting viral, word-of-mouth, dynamics. And finally, related videos and internal search are most contributing towards content dissemination, but for random videos social link is also a key factor. On politics in YouTube, recent work by Garcia et al. [14] performs analysis of collective responses to the YouTube videos of US political campaigns and reveals differences in collective dynamics that suggest stronger interaction among right-leaning users. Weber et al. [36] use YouTube video tags and conclude that general YouTube videos are not polarized in terms of audience, but for subclasses of apolitical videos (e.g., tagged as army ) an audience bias can be predicted (right-leaning in this case). Finally, in Crane et al. [10] collective responses to the videos of Saddam Hussein s death show an extremely fast response and relevance of news and politics for YouTube viewers. 2.4 On Human Behaviour Since our analysis involves the when dimension of video shares on Twitter, we review work on temporal patterns in human behaviour. Quantitative understanding of human behavior, also known as human dynamics, got a new turning point in 2005 after work by A.-L. Barabási [3], where author looked whether the timing of human activities follows any specific pattern. Results showed that there are bursts of intensive activity interchanged with long periods of inactivity (Pareto distribution) rather than events happening at regular time intervals (Poisson). Since 2005 more studies on the inhomogeneous nature of temporal processes in human dynamics have been performed [10, 20, 39]. Various proxies were used to get timing of human activity, e.g., mobile records, web server logs, SMS etc. Recent study by Wu et al. [39] suggests time patterns follow bimodal distribution with bursts of activity explained by power-law distribution in the first mode and exponentially distributed initiation of activity in the second mode. 3. DATA SET We collected data from Twitter and YouTube for our analysis, and related the datasets by looking at instances where links to videos were shared on Twitter. This section describes how we obtained the 87K Twitter users, 5.6 million YouTube videos, and 15 million video sharing events we analyzed in greater detail. Data sets are available at 2013_YouTube_Twitter_ETH_QCRI/index.html.

3 3.1 Twitter The data acquisition starts with a 28 hour time slice from 6/6/ :00 to 8/6/2013 1:00 (AST) of all public Tweets containing any URL provided by GNIP, a reseller of Twitter data. Of these tweets, only tweets by users with at least one follower, one friend, has nonempty profile location and English as profile language were considered. 1,271,274 tweets containing a URL from youtube.com or where identified. URLs shortened by Twitter s default URL shortener t.co were automatically unshortened, but other services were not considered. From this set, 200K distinct tweets were sampled uniformly at random. These tweets account for 177,791 distinct users. Out of these, 100K users were sampled uniformly at random. For each of these users we obtained (up to) their last 3,200 public tweets. In 12,922 cases this failed because the user account had been removed or made private. Along with the tweets, we obtained the user s public profile, containing the user-defined location, their followers and friends count and the set of (up to) 5,000 friends (= other Twitter users the user follows) and 5,000 followers (= other Twitter users who follow this user). 96.8% of our users had less than 5,000 followers and 98.9% had less than 5,000 friends. We also got the profile information for all these friends and followers. Finally, we had 17,013,356 unique tweets with 5,682,627 distinct YouTube video IDs, 19,004,341 friends and 22,182,881 followers for the 87,076 users. From this data we extracted a number of features related to (i) demographics, (ii) location, (iii) interests, and (iv) behavior on Twitter. Demographics. We used a name dictionary to infer the selfdeclared gender of a Twitter user using common first names and Category Sports Movies News & Politics Finance Comedy Science & Technology Non-profits & Activism Film & Animation Sci-Fi/Fantasy Gaming People & Blogs Travel & Events Autos & Vehicles Music Entertainment Education Howto & Style Pets & Animals Shows wefollow.com Interests sports, baseb., basketb., soccer, footb., cricket, nfl movies economics, politics, news banking, investing, finance, entrepreneur, business comedy, comedian tech, technology, gadgets, science, socialmedia non-profits, non-profit, charity, philanthropy film, animation, cartoons scifi, sciencefiction, fantasy games, gaming blogger, blogs, people, celebrity travel, places automotive, autos, cars, vehicles music, dance, dancer entertainment academic, university, education howto, diy, doityourself animals, cats, dogs, pets tv, tvshows, media Table 1: Mapping of YouTube categories (left) to wefollow.com interests (right). The YouTube category Trailers was not mapped. The non-youtube category Finance was added. Finally, we also aggregated features from all YouTube videos shared by a user into statistics such as the average view count or the median inter-event time ( lag ) between video upload and sharing. These features are described in more detail in the next section. 3.2 YouTube Activity on Twitter Given 17,013,356 unique tweets with YouTube video IDs, we retrieved 15,211,132 sharing events and identified 6,433,570 unique gender from YouTube video IDs. We define sharing event as a tweet containing valid YouTube video ID (having category, Freebase topics and html. To detect a subset of potential parents, we scanned each user s bio for mother/mom/wife or father/dad/husband using exact token match. Similarly, we identified a subset of potential stu- sharing events. A fraction of videos in initial 17 million tweets timestamp), thus a tweet with two video IDs is considered as two dents by scanning the bios for student/study/studying. were not valid, thus such tweets and consequently derived sharing events were removed. Using the YouTube API, the follow- Location. Each of the users profile locations was run through Yahoo! s Placemaker geo-coder, ing data about videos was crawled within the period 7/7/2013 com/yql/console/, and for 61,250 profiles, a location could 1/8/2013: title, uploader username, number of views, number of be identified. For the 23,416 users with an identified location in times video has been marked as favorite, number of raters, number the US we checked if their city matched a list of the 100 biggest of likes and dislikes, number of comments, video uploaded time US cities from and categories to which videos belong. Using the Freebase API This gave us an estimate of users from rural vs. urban areas. we also crawled video topics which serve as deprecated video tags Interests. To detect interests of users, we chose to analyze the and are helpful for searching content on YouTube, e.g., hobby, users they follow. These friends were then compared against directory information from 1. Concretely, (http://www.freebase.com/). music, book and many others are examples of Freebase topics we obtained information for the classes Sports, Movies, News & The cleansing stage of data contained three parts: identify noise Politics, Finance, Comedy, Science, Non-profits, Film, Sci-Fi/Fantasy, in data, introduce a filter on Twitter users with extreme behaviour Gaming, People, Travel, Autos, Music, Entertainment, Education, and introduce a filter on legacy YouTube videos (see Section 5). Howto, Pets, and Shows as described in Table 1. In addition to the In our data set, noisy data (0.53%) are those sharing events where information from wefollow.com, we labeled 32 politicians or party the tweet s timestamp is earlier than the video s upload timestamp. accounts on Twitter as either Democrats (13) or Republicans (19). Such negative lags spanned from 1 second up to a couple of years. The same list was also used by Weber, et al. [35, 37]. Users were We removed all such sharing events which seemed to occur 1) due then labeled as left or right according to the distribution of users to updated timestamp of streamed live videos recorded by YouTube they followed (if any). Following had previously been shown to be where the time at the end of streaming is returned as published a strong signal for political orientation [7, 4, 35, 37]. timestamp by YouTube API, and 2) due to altered timestamp of Behavior. To quantify the activity of a user on Twitter, we extracted various features such as their number of tweets, the fracter removing noise, the data reduced to 15,130,439 sharing events, reuploaded videos by some YouTube privileged accounts. Aftion of tweets that are retweets, or the fraction of tweets containing 5,669,907 unique video IDs and 87,069 user IDs. URLs. Handling the data, we came across non-human behaviour explained by automated video sharing. We identify Twitter accounts 1 WeFollow is a website listing Twitter users for different topics and YouTube channels possibly owned by the same user, and label such Twitter users, as promotional since the primary content along with a prominence score, indicating importance of the user in the respective field. WeFollow s directory has been used in several academic studies [25, 5, 1, 27] percentile of Twitter users sorted by the number of YouTube of such videos is advertisement. These accounts are often in top 1 videos

4 shared. Examples of such Twitter-YouTube pairs with the number of shared videos in brackets are: spanish_life aspanishlife (8,119) on real estate advertisement and RealHollywoodTr bootcampmc (5,315) blogging on fitness and health, while the mean number of shares per user was found at 174 video shares. To remove promotional users, we applied a filtering mechanism based on a) similarity between usernames in Twitter and YouTube using longest common substring (LCS), and/or b) amount of videos in Twitter account coming from one YouTube source; for details refer to supplementary material submitted in: We follow an aggressive approach when detecting promotional users; thus, there is a possibility of some regular users being labeled as promotional but not the other way round. As a result of filtering we split Twitter accounts into 71,920 regular nonpromotional and 15,149 promotional accounts. 4. WHO WATCHES WHAT? In this section, we present a first analysis of who (in terms of Twitter user features) watches and shares what (in terms of YouTube video features). Though we include here user features related to the inter-event time, early video sharers are analyzed in Section Cluster Analysis As a first picture of who watches and shares what we present a cluster analysis of 26,938 non-promotional, sufficiently active users who shared at least 10 YouTube videos and had at least 10 friends matched on Twitter through WeFollow (see Table 1). These users were clustered into eight groups according to the (normalized) distribution of YouTube categories of the videos they shared using an agglomerative hierarchical clustering algorithm with a cosine similarity metric [21]. Table 2 shows the results. We were interested to see which differences for Twitter features are induced when users are grouped solely according to YouTube categories. To describe the clusters found, Table 2 first lists the discriminative YouTube features as output by the clustering algorithm. Below it lists the 5 most prominent terms from the Twitter bios of users in this group. These terms, which were not used to obtain the clustering, give fairly intuitive descriptions of the user groups. Finally, the table lists features whose average value differs statistically significantly (at 1%) between the cluster and all 27K users. These features are ranked by the absolute difference between global and within-cluster averages, divided by the standard deviation. Inspecting the clusters, certain observations can be made. First, the discriminating YouTube categories (first block of five lines) are largely aligned with Twitter categories that are over-represented in the corresponding cluster (The T * in the bottom block of five lines). This alignment we will investigate more in Section 4.3. Second, there are certain correlations between the demographics and the YouTube categories. For example, Cluster 1 is focused on sports and has more male users, whereas Cluster 7 is centered around entertainment and people/blogs and has more female users. Recall that the clustering was done according to YouTube categories, whereas the demographic information comes from Twitter, indicating the possible benefits of the combination. Finally, the clustering also picks up a connection to political orientation. Concretely, Cluster 5 contains more conservative users with an increased interest in news and politics (more on this in Section 4.3). 4.2 Demographics To understand the significance of the influence of variables such as gender or occupation on (i) the number of views, (ii) the polarization or controversiality 2, and (iii) the lag we applied a so-called permutation test [17], which unlike other tests does not make assumptions on the distribution type of the observed variables. To test, say, the impact of stating student in the Twitter bio on the number of views we first computed the average view count for all views by the student group and compared this with the average for the complement non-student group. Let δ be the observed difference. Then to test the significance of δ we pooled all the student and non-student labeled observations and randomly permuted the two labels to get two groups. For these two groups, obtained by a label permutation, a δ p was then computed. This process was repeated 10,000 times to estimate the common level of variability in the δ p. We then marked the δ as significant if it was in the bottom/top 0.5% (or 2.5%) of the percentiles of the δ p. In Table 3, a indicates that δ was in the bottom/top 0.5% and indicates that it was in the bottom/top 2.5%. For Table 4 we used a similar procedure to test the statistical significance of the Spearman rank correlation coefficient. Here, to establish the common level of variability we randomly permuted both rankings to be correlated 10,000 times and observed the distribution of the Spearman rank correlation coefficients. If the original, actual coefficient fell within the bottom/top 0.5%/2.5% we marked it as significant. Table 3 shows correlations with respect to the per-user median (i) number of views of shared videos, (ii) polarization/controversiality of shared videos and (iii) of inter-event times. One of the demographic differences that can be spotted is that men compared to women share less popular (fewer views) videos earlier (smaller lag). Some differences are hidden in this analysis though, as both urban and rural users seem to have a lower lag (share fast). The explanation for this apparent paradox is that users who have either no self-declared location or where the location is outside of the US have a comparatively larger lag, and that the comparison is with non-urban and non-rural, which mostly consist of these users, see Section 5.1 for more details. 4.3 Correlation Analysis In this section we analyze the relationship between Twitter user features, such as the number of followers or the fraction of tweets that contain a hashtag, and YouTube features, such as the number of views. As a simple analysis tool we computed Spearman s rank correlation coefficient for each pair of features. To simplify the presentation, we group the Twitter features into four classes. First, to see how social a user is we look at (i) the number of friends, and (ii) the number of distinct users mentioned. Second, to see how common sharing is for a user we included the fraction of tweets that (i) are retweets, (ii) contain a hashtag, (iii) contain a YouTube URL, and (iv) contain a non-youtube URL. Finally, we look at notions of influence that includes (i) the number of Twitter followers, (ii) the fraction of a user s tweets that are retweeted, (iii) the average retweet count of tweets that obtained at least one retweet, and (iv) the average number of followers of a user s followers. For YouTube we consider the medians of (i) the number of views of videos shared by a user, (ii) the polarization of these videos, and (iii) the time lag of the video sharing events of the user. The 2 We calculate the polarization that a YouTube video creates on its viewers through its amounts of likes L v, dislikes D v, and total views V v, through the equation P ol v = Lv D v. The Vv Vv rationale behind this calculation is the rescaling of the likes and dislikes ratio based on the fact that they do not grow linearly with each other. The exponents correspond to the base rates of the logarithmically transformed amounts of views, likes and dislikes. This way we standardize the ratio over their nonlinear relation.

5 Discriminating features Top profile words Top features Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8 (2740) (2327) (2493) (5390) (2535) (4052) (3697) (3704) sports animals non-profit music news/politics film/animation entertainment travel music music music non-profit music education people/blogs music entertainment entertainment sports sports comedy music howto gaming people/blogs people/blogs entertainment education entertainment non-profit sports science/tech non-profit sports education animals education sports music autos fan music music music music music life music music life life life life life music gamer sports fan fan artist world fan fan life life lover world producer conservative lover live fan football writer lover live people time justin youtube T sports + Y animals + Y non-profit + Y music + Y news/politics + Y film + Y howto + Y gaming + Y sports + T animals + T non-profit + median lag + Y comedy + Y education + Y people + Y science/tech + male + std dev. of lag + num. usrs rtwd + T music + T news/politics + frac. Tw other URLs Y entertainment + T gaming + frac. Tw other URLs + acnt created at frac. of usrs Tw rtwd + mean lag + frac. Tw other URLs + T movies + female + Y shows + avg. frnds of frnds T education + frac. Tw Y videos Y education leaning republic + Tfilm + avg rtwt count user + num. vids shared + Table 2: Clusters obtained by clustering normalized YouTube categories distributions for each user. number of comments received by videos shared by a user behaved qualitatively identical to the number of views and is omitted. Our results are presented in Table 4. Each cell in the table links a Twitter user feature group (row) with a particular YouTube video feature (columns). The three symbols in the cell indicate + = significant (at 1% using a permutation test as previously described) and positive, - = significant and negative, and 0 = not significant or below The symbols are in the order of the features listed above in the text. Certain general observations can be made. For example, all of our notions of social correlate with a drop in lag time, and out of the topics considered, News & Politics is the one that is most consistently linked with users who actively share. But other observations are more complex and, for example, only some but not other notions of influence correlate positively with a large number of views. We also looked at relation between the Twitter user features and the fraction of video shares for various YouTube categories. Table 4 shows results for the three example categories Music, Sports and News & Politics. Again, different patterns for different definitions of influence can be observed. Out of the three topics, News & Politics is the one that correlates most with social and with sharing behavior. 4.4 Interests on Twitter vs. YouTube Given that our analysis links Twitter behavior to YouTube sharing events it is interesting to understand if the interests on the two platforms are aligned. Though we cannot reason about YouTube views not corresponding to Twitter sharing events, we compared the topical categories of a user s shared videos with the topical categories of their Twitter friends. To infer the latter, we used the WeFollow data described in Section 3.1 where entries in WeFollow were also weighted according to their prominence score. This way, a user (prominence 99) is given a higher weight for sports than a user (prominence 23). To compare if a user s YouTube category distribution and Twitter friends WeFollow distributions are similar, we decided not to compare these directly due to the following expected bias. The coverage by WeFollow for the different categories is likely to differ. male fe- urban rural stu- mo- fa- US male dent ther ther views polariz lag Table 3: Demographics. A + indicates a positive deviation from the general population, - negative and 0 not statistically significant. indicates that the significance was based on δ being in the bottom/top 0.5%, for the bottom/top 2.5%. For a popular topic such as music, the coverage is potentially overproportionally good compared to less popular ones. To correct for this, we first normalize as follows. Let c T ij the prominence-weighted fraction of a user i s Twitter friends that are recognized in the WeFollow category j. Similarly, define c Y ij for their shared YouTube category distribution. Now normalize both of these matrices for a fixed category j such that ĉ T ij = c T ij/ k ct kj. This, effectively, compares users according to their relative interest in a given topic. This is then further normalized to obtain per-user probability distributions via c T ij = ĉ T ij/ k ĉt ik, similarly for c Y ij. Then, for each category j, we look at the distribution of the differences c T ij c Y ij across users i. Categories where this difference is positive indicate a relatively higher importance/preference for Twitter, cases with a negative preference indicate a relatively higher importance for YouTube. Generally, the differences were very small with the median difference not exceeding.04 in absolute value for any category and being smaller than.01 for more than half. Some categories such as Film & Animation were very slightly more prominent on YouTube (indicated by the negative mean and median), whereas Science & Tech was slightly more prominent on Twitter. This analysis was done for active users with at least 10 shared videos and at least 10 friends matched on WeFollow. 4.5 Politics in Twitter and YouTube To see how Politics is introduced in both Twitter and YouTube, we had the following questions in mind: a) which political user groups share more politically charged content, b) what is the most frequent content of each political user group. As mentioned in Section 3.1, to separate users into political groups we followed a US bipartite system with audience divided into left (L) and right (R) users. Users that followed more of the 13 left seed users were marked as left-leaning, users that followed more of the 19 right seed users were marked as right-leaning and users with a split preference or not following any seed user were marked as apoviews polariz. lag Music Sports News Social Sharing Influence Table 4: Columns 1-3 show the relation between Twitter and per-user aggregated YouTube features. Columns 4-6 show the relation between Twitter and fractions of categories of YouTube videos shared for three example categories. Twitter features are grouped into three classes. Symbols indicate strength and direction of significance. Bold symbols indicate an absolute value of Spearman s Rank correlation coefficient > 0.1. See text for details.

6 litical. Our approach resulted in three disjoint sets of left users U L ( U L = 11, 217), right U R ( U R = 1, 046) and apolitical users U A ( U A = 57, 672). We addressed question a) by looking at how much L, R, A users share videos in the category News & Politics. If left-leaning user u L shared set of videos V L with a subset of videos in the category News & Politics, VL News&Politics V L; then we looked at the distribution of ratio of number of political video shares to total amount of shares per each u L, u R and u A: r {ul,u R,u A } = News&Politics V {L,R,A} V {L,R,A}. On average mean ratio of videos with political content for each user population is: µ L = 0.06, µ R = 0.29, µ A = 0.05, which confirms right users share more news and politics related videos compared to left users and apolitical users. To answer question b) we calculated topic distributions of videos per each political user category and rank topics in each user group according to their frequency. In order to statistically compare the ranking of topics across groups, we applied the distance between ranks of topics method by Havlin [18]. If R 1(λ) is the rank of topic λ in user group 1 and R 2(λ) is the rank of the same topic λ in user group 2, distance r 12(λ) between the ranks of topic λ in two user groups is r 12(λ) = R 1(λ) R 2(λ). Thus, the distance between two user groups is defined as the mean square root distance between λ r2 12(λ)) 1 2, where N the ranks of all common topics: r 12 = ( 1 N is the number of common topics across user groups. We summarize the distance metric across four user groups: Left, Right, Apolitical and all population (Left, Right and Apolitical) with N = 23, 844 and R max = 281, 265 in Table 5. We find that the distances from right users is maximum to left, apolitical and all, and left and apolitical are close to each other in terms of distance. This suggests that right users have their own hierarchy of topics distinguished from left and apolitical users, while latter groups have more similar topics. To support our findings in distance between topic ranks, we look at the most 20 frequent Freebase topics for each user group. Right users share more politically charged content including politicians (Barack Obama, Alex Jones, Ron Paul), news channels (Russia Today, The Young Turks), military-related keywords (Gun, Police) and concepts (USA). Conversely, left-leaning users have similar interests as apolitical, giving priority to entertainment videos. For example, Barack Obama topic (Freebase ID /m/02mjmr) is placed 30th popular among left users and 1st among right population. Results of a) and b) support each other and give the following picture on political engagement of L/R/A user groups. For left users, a) says they act as apolitical users and on average do not share much political videos, with b) confirming that among top 20 video topics of left users none relate to politics. And for right users, a) states that they share more political content which is supported by b) where 9 out of top 20 topics have government, news, politics related concepts. A possible explanation of the fact that the supposed left is much closer to the apolitical set than the right is that is not a good proxy for political orientation due to his popularity in social media. To show that is a signal for both a) being more politicized and b) being more left-leaning we perform a number of statistical tests on differences followers and non-followers. For a) we Left Right Apolitical All Left Right Apolitical All Table 5: Distance across political, apolitical and all user groups. count the number of known political hashtags such as #p2, #tcot, #obama, #ows and others for both user groups. For b) we count the number of words liberal, progressive, democrat and conservative, republican in the bios of both followers and nonfollowers. The idea here is that the first (abbrev. L-words) and second (abbrev. R-words) word groups are indicators of someone being left- and right- aligned respectively. Table 6 shows results with a clear message: followers are at least 4 times more likely to be left-aligned compared to non-followers (0.70% vs. 0.16%) and are twice more likely to insert political hashtags in their tweets compared to non-followers (20.5% vs. 10.3%). Ratios were tested with a Chi-square test for equality of proportions with a 95% confidence interval with significance at p-value < EARLY VIDEO ADOPTER This section answers a) who shares video content faster and b) which information is shared faster. Thus, we look at another dimension linking Twitter and YouTube the time lag between the video upload and the sharing event on Twitter, also known as inter-event time or lag and denoted as t. We perform inter-event time analysis on a system level and per user. For system inter-event time analysis we collected time lags, t w v, per sharing event (tweet w, video v), resulting in time lag collection T, i.e., w TWEETS, v VIDEOS, t w v T, where TWEETS is a set of all tweets in data set and VIDEOS is a set of all videos. Thus, a user having more than one tweet with video has more than one time lag; similarly, a video that has been shared more than once will have more than one time lag; thus, several sharing events of a video are considered as separate sharing events, and time lag of each such event becomes a member of collection T. For per user inter-event time analysis we calculated median time lag per each user u, t median u. One limitation of the YouTube dataset was a non-uniform distribution of video age. Thus, we removed videos before certain epochs when YouTube and Twitter underwent changes. First, Twitter was founded in 2006, nearly one year after YouTube, thus we cannot sensibly study sharing of videos uploaded in The next disrupting event is the introduction of Twitter share button in YouTube on 12/8/2010, changing the ease of sharing. Additionally, our crawled dataset had another constraint: a limit of 3,200 tweets per user which mainly has effect on tweets sample of active Twitter users. Selected sample potentially contains only recent tweets and thus relatively young videos in those tweets. In order to get a uniform age of shared videos, we determined a cutoff time for discarding videos of certain age at which amount of shares per user is affected the least. We removed videos older than θ = 1/1/2012, which automatically discards tweets containing such videos. The filtered data set contained 11, 697,550 sharing events for 2,510,653 distinct videos coming from 70,874 non-promotional users. 5.1 Who Shares Faster in Twitter Question a) was addressed by comparing inter-event times per different user groups. We first looked at time differences between promotional and non-promotional Twitter accounts, see Figure 1 from a system s point of view (rather than aggregating per-user). Visually, we observe that promotional accounts are faster at shar- Political # L-words R-words Total followers 20.5% (3829) 0.70% (130) 0.28% (53) followers 10.3% (8615) 0.16% (131) 0.35% (281) Table 6: Percentage and counts in brackets of users having political tweet hashtags, left - and right - words in account description followers and non-followers.

7 P(Δt) Promotional Not promotional Δt, sec F(Δt) Promotional Not promotional Δt, sec Figure 1: Inter-event time distribution P ( t) and accumulative time distribution F ( t)s of promotional (red) and nonpromotional (blue) accounts. ing content compared to regular users, see the head at P ( t) and tail at F ( t). Statistically, median(promo)= sec (18 hours), median(non-promo) = sec (38 hours). Within an hour promotional accounts have twice amount of shares compared to nonpromotional accounts which constitutes twentieth and tenth percentile respectively. Having confirmed that there is a difference between human and machine behaviour, we performed a per-user inter-event time analysis for different user groups of non-promotional accounts. For each user group U G we calculate the median lag per group (median of users medians): t G = t median u median u U G. For example, in Section 4.5 we looked at who shares what per political user groups (Left vs. Right). Here we find that on average right users share newly uploaded video content at least 3 days earlier compared to left users. Note that the set of videos being shared is different though. Our findings on the median of the median inter-event times for various user groups are presented in Table 7. Time differences in the per topic medians follow the same trend as the overall distribution (not presented here), so the observed differences cannot solely be explained by differences in category preferences for different user groups. We highlight the following observations on who shares faster: concerning location, urban users are around 14 hours faster than rural users, and across gender women are much slower compared to men. Globally, people from Indonesia and Thailand have a reaction time in the order of a day, where as the greatest lag in the order of a half of a month is observed from people tweeting in Brazil. But as we selected only English profiles the results for other countries might be conflated with other factors. While doing our analysis we also observed that an important dimension of the quickness of the users relates to how often they share videos on Twitter. Figure 2 shows the median per-user median of the inter-event times for users divided into deciles according to the number of YouTube videos they have shared. The inter-event times are given in hours and range from 352 hours for the least active to 38 hours for the most active users. As the difference is quite striking, we inspected term clouds for the Twitter bios of the least active YouTube sharers and the most active YouTube sharers. Interestingly, the two are quite similar, apart from a prominent YouTube for the most active users, indicating that the difference in lag time is related to the activity level, not topical interests. 5.2 What is Shared Faster in Twitter To answer question b) we performed system inter-event time analysis and distributed time lags in T into relevant video category. If T C is a collection of system lags of set of videos belonging to category C (VIDEOS C ), then t w v T C, if v VIDEOS C. YouTube provides 19 video categories, in Figure 2 inter-event time Median of per-user median Δt, hrs Bucket number F(Δt) Entertainment, med = 1.345*10 5 Gaming, med = 2.99*10 4 Movies, med = 1.344*10 7 News & Politics, med = 5.519*10 4 Pets & Animals, med = 3.147*10 5 Trailers, med = 7.45* Δt, sec Figure 2: Median of per-user median inter-event times for users bucketed (into deciles) by the number of YouTube videos shared (left). Accumulative time distribution F ( t) of videos belonging to various YouTube categories (right). of 6 categories which exhibit different patterns time distribution are shown, due to space limits. Remaining 13 video categories lag show similar patterns as Entertainment and Pets & Animals. Our findings show that among all videos, Gaming and News & Politics videos are the fastest shared with median time of 8 and 15 hours respectively, Movies and Trailers have the greatest lag between video uploaded and being tweeted with median of 5 and 3 months respectively. 6. VIDEO POPULARITY ANALYSIS 6.1 Forecasting Video Popularity In this section, we present our work on early indicators of the popularity of a video, i.e., its amount of views a sufficient amount of time after its creation. Our approach is based on analyzing the Twitter attention to the video in the first moments after its creation, including the user profile information explained above. For this task, we filter our data following the cutoff date explained in Section 5, and restrict our analysis to videos that were created before June 1st 2013, a total of 4,822,675 videos created more than a month before the data retrieval date. We estimate the popularity of a video through the amount of views more than a month after its creation, following previous approaches by Szabo and Huberman [31], in line with the very fast decay of views that most videos have in YouTube as shown in Crane and Sornette [11]. Category Med. int. time num. users promotional non-promotional promotional urban promotional rural non-promotional urban non-promotional rural left right male female student not student mother not mother father not father Table 7: Comparison of median of median inter-event times (in hours) for various groups of users

8 For each video, we analyze its Twitter attention during the first week after its creation. We remove from our analysis all videos that, during this first week, did not have any sharing event in our data. This removes old videos that were created before Twitter grew to its actual user base, leaving us with a set of 276,488 videos. To analyze the role of user interests and promotions, we divide our analysis of Twitter data in two subsets: one only based on promotional users, and one based on non-promotional users. After such filtering, we have a total amount of 1,200,924 shares and 182,135 videos from promotional users, and 779,821 shares and 133,373 videos from non-promotional users. Note that these two datasets are disjoint in terms of Twitter data. No Twitter share is taken into account in both, but they overlap in 17,093 videos. 6.2 Twitter Video Metrics We measure the early Twitter attention towards a video aggregating two types of data: i) amount of tweets or attention volume, and ii) reputation metrics calculated from the follower network and retweeting behavior of the users involved. For each video, we computed five metrics of Twitter attention that summarize different factors that potentially increase video popularity: We measure the total attention in Twitter to a video through the amount of shares S v during the first week, which were produced by the set of users that shared the video in the first week, noted as U v U. Each user u created n v(u) shares of the video, which were received by the set of followers of those users. We define the exposure E v of a video as the sum of followers of the users that shared the video in the first week, where F (u) is the set of followers of user u, and f(u) = F (u). This measure approximates the size of the first order neighborhood of the accounts sharing the video, overcounting their common friends. We aggregate the social impact I v of the users that shared the video estimated as their mean amount of retweets for tweets with nonzero retweets (R 0(u)). To improve estimation of the reputation of the users sharing the video in the first week, we approximate the size of the second-order neighborhood of the users that shared the video. For this, we calculate the second-order exposure E v, as the sum of the amount of followers of the followers of the users that shared the video. Each user exposed to the shares of the video is subject to have its attention diluted over a set of different information sources. For this reason, we calculate the share of voice A v of the early users, as the ratio of their amount of followers divided by the average amount of users followed by their followers, where f 1 (v) is the amount of users that v follows. This way, we correct the case of users with many followers, who would give a lower share of voice if they follow a large amount of other users. On the other hand, a user with a low amount of followers can have a large share of voice, when its followers do not follow many other accounts. We use these five metrics to create a video vector with a sixth dimension being its final amount of views. In the following, we present our analysis of the relations between these five metrics and the popularity of a video. 6.3 Factors Influencing Video Popularity Amount of shares Exposure Social impact S v = n v(u) E v = f(u) I v = R 0(u) u U v u U v u U v Second-order exposure Share of voice E v = f(u ) A v = f(u)/ f 1 (u ) u F (u) u U v u U v u F (u) Table 8: Twitter social metrics used related to video popularity. Youtube Views 1e+03 1e+04 1e+05 1e+06 1e+07 user accounts promotional accounts V S Twitter Shares Youtube Views 1e+03 1e+04 1e+05 1e+06 1e+07 user accounts promotional accounts V I e+00 1e+02 1e+04 1e+06 Social Impact [retweets] Figure 3: Mean amount of views videos binned by amount of shares (left) and social impact of early adopters (right). Error bars show standard error, and dashed lines regression results. The distribution of views per video, as well as the other metrics explained above, have large variance and are skewed to the right. To avoid the uneven leverage of extreme values of these distributions, we have applied a logarithmic transformation to each one of them, reducing their variance but keeping their rank. In the first step of our analysis, we computed correlation coefficients between the logarithm of the amount of views and the other five variables. The results for promotional and non-promotional data are summarized in Table 9, revealing significant correlations for all of them. Some of this correlations are of very low magnitude or even negative sign, suggesting a more careful analysis. Our first observation is that the amount of shares in the first week of a video is a better predictor for its popularity in the case of non-promotional users and promotional ones (ρ = vs ρ = 0.298). The left panel of Figure 3 shows the mean amount of views of videos binned exponentially by their amount of Twitter shares. The two types of user activities diverge after 20 shares in the first week, where for the case of non-promotional users the amount of views appears to be increasing but saturating. Regression on a power-law relation between views and shares V S α reveals a superlinear scaling with α = 2.18 ± 0.02, i.e., the final views of a video has a quadratic relation to the amount of regular user shares in the first week. As an example of this superlinear growth, the mean amount of views for videos with 2 shares in the first week is 151,374.5, for videos with 7 shares is 644,522.4, and for videos with 12 shares is 2,349,317. This gives an increase of almost 500K views for the five shares after the first two, but an increase of more than 1.7M for the five shares after the first seven. The diverging pattern in both types of user activity reveals that, when promotional accounts share the same video more than 20 times in the same week, the final amount of views does not increase. In fact, there is a decreasing pattern of views, suggesting the existence of information overload or spamming behavior in promotional users. For both types of Twitter users, the aggregated social impact in terms of mean retweet rates is the best predictor for the popularity of a video (ρ = and ρ = 0.28). The right panel of Fig. 3 shows the mean view values for bins of the aggregated social impact, with the result of regression of the form V I β, where Type S v E v I v E v A v Nonpr Promo Table 9: Pearson s (first value) and Spearman s (second value) correlation coefficients between video views and Twitter measures: ρ(log(v v), log(x)), all with p < X

9 Youtube Views 1e+03 5e+03 5e+04 5e+05 5e+06 user accounts promotional accounts V E e+00 1e+02 1e+04 1e+06 1e+08 Exposure [followers] Youtube Views 1e+02 1e+04 1e+06 user accounts promotional accounts V E e+02 1e+05 1e+08 1e+11 1e+14 Second-order Exposure [FoF] Figure 4: Mean amount of views videos binned by first and second order exposure. Error bars show standard error, and dashed lines regression results. β = ± for non-promotional users and β = ± for promotional ones. This result reveals a sublinear relation between the amount of views and the social impact of the accounts that shared the video in the first week, close to a square root. The amount of views of videos showed a low positive correlation coefficient with the exposure of the shares in the first week, measured through amount of followers. The left panel of Fig. 4 shows the mean amount of views versus the exposure in the first week, revealing a very soft increasing pattern in both. On the other hand, the amount of views has a more substantial correlation with the second-order exposure, with correlation coefficients of and for regular and promotional users respectively. The right panel of Fig. 4 shows this stronger relation, with a regression result of exponent ± for non-promotional users, and of ± for promotional ones. This comparison reveals that the second-order exposure is a much better predictor for the popularity of a video than the amount of followers of the initial sharers. This result calls for more stylized reputation metrics that take into account global information beyond amount of followers and retweet rates, for example centrality [15], or coreness [13] metrics. Finally, the aggregated share of voice of the accounts that shared the video during the first week did not provide clear results, with a significant negative correlation of for non-promotional users, and of for promotional ones. This suggest that, if information overload and competition for attention are present in Twitter, they need to be measured with more precise approximations that the correction we presented in the previous section. Nevertheless, the share of voice of the users sharing a video still contains relevant information that we introduce in the regression model we explain below. 6.4 Combining Data in a Regression Model The above results show the pairwise relation between the amount of views of a video and each one of our five Twitter metrics. This analysis ignores the possible effect of the combination of different metrics, as it can be expected that they are correlated with each other. To provide a deeper analysis on how these Twitter metrics influence the final amount of views, we propose a substitutes model in which the products of powers of each variable are proportional to the final amount of views: V v S α v I β v E γ v E δ v A κ v (1) This model is equivalent to a linear regression model after the logarithmic transformation of all the independent variables. Training this regressor on the promotional user data gives R 2 = 0.107, explaining about 10% of the variance of log(v ). On the nonpromotional user dataset, the regressor achieves R 2 = 0.199, explaining almost 20% of the variance of the final amount of views of a video based exclusively on information extracted from Twitter. Type S v E v I v E v A v Not promo Promotional Table 10: Regression coefficients for Eq. 1. Significance level p < 10 9, or p > 0.01 otherwise. This opens the possibility to improve previous models that used only data from YouTube [31, 28], which could also be combined with data from other online communities, as previously done in Soysa et al. [30] with a limited sample of Facebook data. The estimated coefficients for the exponents of Eq. 1 are reported in Table 10, which allow us to compare the size of the effects of each Twitter metric. This analysis reveals the lack of relevance of the first-order exposure for the case of non-promotional users as also shown in Cha et al. [6]. The correlation between first order exposure and views shown in Fig. 4 is a confound due to the correlation of exposure with other metrics, such as impact or second-order exposure. To assess the prediction power of our model for non-promotional users, we transformed the regression problem to a dichotomous classification, in which we tag a video as popular if it gathered at least 10,000 views. Using the regression model explained above, we can predict if a video will reach more than 10,000 views based on the first week of Twitter activity. If the estimator of Eq. 1 gives a value above 10,000, we classify the video as popular. We performed 10-fold cross validation on the non-promotional users dataset, fitting the regressor to 90% of the data and validating it on the rest 10%. The mean base rate of popular videos for the 10 evaluations is 0.493, and our predictor achieves a mean precision of and a mean recall of for the popular class. Both values are significantly above the precision of random classifiers over the same partitions, which produced a precision of and a recall of This experiment shows that, using Twitter data only, a prediction can achieve a precision value much higher than expected from a random classifier. 7. CONCLUSIONS We gathered a high-quality dataset based on the combination of two sources: 17 million unique public tweets for 87K users on Twitter and YouTube data for 5 million videos. Through this combination of data sets, we could obtain novel, detailed insights into who watches (and shares) what on YouTube, and when (that is, how quickly). We applied a set of heuristics to infer demographic data including gender, location, political alignment, and interests. We designed a new method to distinguish promotional Twitter accounts, who almost exclusively share their own YouTube videos and validated our expectation that promotional users share their own videos much faster than regular ones. Our results also include a new method to characterize different user segments in terms of YouTube categories, Twitter activity, and Twitter user bios. These allowed us to analyze the relation between demographic factors and the features of YouTube videos, including their amount of views. Our detailed statistical analysis reveals correlations between Twitter behavior and YouTube video content. In addition, our clustering analysis shows that the topic preferences of the two platforms are largely aligned. Our results on politics quantitatively show that politically right users are further from the center than politically left users, and among all video categories News & Politics correlates most with social and with sharing behavior. Our detailed analysis distinguishes which user types share videos earlier/later as well as which video classes are shared earlier/later. Finally, we developed a regression model for the effect

10 of early Twitter video shares by influential users on the final view count and we conclude by observing that second-order neighborhoods and retweet rates are much better predictors of ultimate video popularity than raw follower counts. 8. ACKNOWLEDGMENTS ETH received funding from the Swiss National Science Foundation under the grant CR21I1_ / 1. The authors would like to thank GNIP for providing data during a free trial period of their Powertrack product. 9. REFERENCES [1] F. Al Zamal, W. Liu, and D. Ruths. Homophily and latent attribute inference: Inferring latent attributes of twitter users from neighbors. In ICWSM, [2] S. Asur, B. A. Huberman, G. Szabo, and C. Wang. Trends in Social Media: Persistence and Decay. In ICWSM, pages , [3] A.-L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435:207, [4] P. Barbera. Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data, Forthcoming. [5] S. Bostandjiev, J. O Donovan, and T. Höllerer. Tasteweights: A visual interactive hybrid recommender system. In RecSys, pages 35 42, [6] M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM, [7] M. Conover, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on twitter. In ICWSM, [8] M. D. Conover, B. Gonc, A. Flammini, and F. Menczer. Partisan Asymmetries in Online Political Activity. EPJ DataScience, 1(6):1 17, [9] M. D. Conover, B. Goncalves, J. Ratkiewicz, A. Flammini, and F. Menczer. Predicting the Political Alignment of Twitter Users. In PASSAT-SocialCom, pages , [10] R. Crane, F. Schweitzer, and D. Sornette. Power law signature of media exposure in human response waiting time distributions. Physical Review E, 81(5):056101, [11] R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. PNAS, 105(41): , [12] F. Figueiredo, F. Benevenuto, and J. M. Almeida. The tube over time: Characterizing popularity growth of youtube videos. In WSDM, pages , [13] D. Garcia, P. Mavrodiev, and F. Schweitzer. Social resilience in online communities: The autopsy of friendster. 1st ACM Conference on Online Social Networks (Forthcoming), [14] D. Garcia, F. Mendez, U. Serdült, and F. Schweitzer. Political polarization and popularity in online participatory media. In PLEAD, pages 3 10, Nov [15] D. Gayo-Avello. Nepotistic relationships in twitter and their impact on rank prestige algorithms. Inf. Process. Manage., 49(6): , [16] S. Goel, J. M. Hofman, and M. I. Sirer. Who does what on the web: A large-scale study of browsing behavior. In ICWSM, [17] P. I. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer, 3 edition, [18] S. Havlin. The distance between zipf plots. Physica A Statistical and Theoretical Physics, 216(1-2): , [19] S. R. Kairam, M. R. Morris, J. Teevan, D. J. Liebling, and S. T. Dumais. Towards supporting search over trending events with social media. In ICWSM, [20] M. Karsai, K. Kaski, A.-L. Barabási, and J. Kertész. Universal features of correlated bursty behaviour, May [21] G. Karypis. Cluto: A clustering toolkit, [22] J. Kulshrestha, F. Kooti, A. Nikravesh, and P. K. Gummadi. Geographic dissection of the twitter network. In ICWSM, [23] M. S. S. Laine and G. Ercal. User Groups in Social Networks: An Experimental Study on YouTube. In th Hawaii International Conference on System Sciences, pages 1 10, Jan [24] J. Lehmann, B. Gonçalves, J. J. Ramasco, and C. Cattuto. Dynamical classes of collective attention in twitter. In WWW, page 251, New York, New York, USA, [25] Q. V. Liao, C. Wagner, P. Pirolli, and W.-T. Fu. Understanding experts and novices expertise judgment of twitter users. In CHI, pages , [26] A. Mislove, S. Lehmann, Y.-y. Ahn, J.-p. Onnela, and J. N. Rosenquist. Understanding the demographics of twitter users. In ICWSM, pages , [27] M. Pennacchiotti and A.-M. Popescu. Democrats, republicans and starbucks afficionados: User classification in twitter. In KDD, pages , [28] H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of youtube videos. In WSDM, pages , [29] L. Qiu, Q. Tang, and A. B. Whinston. Two formulas for success in social media: Social learning and network effects, Forthcoming. [30] D. Soysa, D. Chen, O. Au, and A. Bermak. Predicting youtube content popularity via facebook data: A network spread model for optimizing multimedia delivery. In CIDM, pages , [31] G. Szabo and B. A. Huberman. Predicting the popularity of online content. Communications of the ACM, 53(8):80, [32] J. Teevan, D. Ramage, and M. R. Morris. #twittersearch: A comparison of microblog search and web search. In WSDM, pages 35 44, [33] A. Ulges, D. Borth, and M. Koch. Content analysis meets viewers: linking concept detection with demographics on youtube. IJMIR, 2(2): , [34] C. Wang and B. A. Huberman. Long trend dynamics in social media. EPJ Data Science, page 8, May [35] I. Weber, V. R. K. Garimella, and E. Borra. Political search trends. In SIGIR, pages ACM, [36] I. Weber, V. R. K. Garimella, and E. Borra. Inferring audience partisanship for youtube videos. In WWW, pages 43 44, [37] I. Weber, V. R. K. Garimella, and A. Teka. Political hashtag trends. In ECIR, pages , [38] I. Weber and A. Jaimes. Who uses web search for what: and how. In WSDM, pages 15 24, [39] Y. Wu, C. Zhou, J. Xiao, J. Kurths, and H. J. Schellnhuber. Evidence for a bimodal distribution in human communication. Proceedings of the National Academy of Sciences, 107(44): , Oct

Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership

Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership Adiya Abisheva Collaborators: Kiran V.R. Garimella, David Garcia, Ingmar Weber Motivation www.sg.ethz.ch

More information

Broadcast Yourself. User Guide

Broadcast Yourself. User Guide Broadcast Yourself User Guide 2015 American Institute of CPAs. All rights reserved. DISCLAIMER: The contents of this publication do not necessarily reflect the position or opinion of the American Institute

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Last Updated: 08/27/2013. Measuring Social Media for Social Change A Guide for Search for Common Ground

Last Updated: 08/27/2013. Measuring Social Media for Social Change A Guide for Search for Common Ground Last Updated: 08/27/2013 Measuring Social Media for Social Change A Guide for Search for Common Ground Table of Contents What is Social Media?... 3 Structure of Paper... 4 Social Media Data... 4 Social

More information

We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at

We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at S1 SUPPLEMENTARY INFORMATION Online, interactive visualizations: We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at http://hedonometer.org. Links to example

More information

Introduction Course in SPSS - Evening 1

Introduction Course in SPSS - Evening 1 ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

More information

Business Process Services. White Paper. Social Media Influence: Looking Beyond Activities and Followers

Business Process Services. White Paper. Social Media Influence: Looking Beyond Activities and Followers Business Process Services White Paper Social Media Influence: Looking Beyond Activities and Followers About the Author Vandita Bansal Vandita Bansal is a subject matter expert in Analytics and Insights

More information

Marketing and Promoting Your Cooperative Through Social Media. How social media can be a success for your housing cooperative

Marketing and Promoting Your Cooperative Through Social Media. How social media can be a success for your housing cooperative Marketing and Promoting Your Cooperative Through Social Media How social media can be a success for your housing cooperative The History of Social Media First email sent Geocities, first social network

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Social Media Business Partner Advertising Kit. Click to add text. 2012 IBM Corporation

Social Media Business Partner Advertising Kit. Click to add text. 2012 IBM Corporation Social Media Business Partner Advertising Kit Click to add text 2012 IBM Corporation Social Media Advertising - Agenda Introduction Keyword Research Facebook Twitter LinkedIn YouTube Sample ad copy and

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Leveraging Social Media

Leveraging Social Media Leveraging Social Media Social data mining and retargeting Online Marketing Strategies for Travel June 2, 2014 Session Agenda 1) Get to grips with social data mining and intelligently split your segments

More information

The Ello social media network: Identifying the Joiners, Aspirers, and Detractors. November 2014 Insight Report using our DeepProfile capabilities

The Ello social media network: Identifying the Joiners, Aspirers, and Detractors. November 2014 Insight Report using our DeepProfile capabilities The Ello social media network: Identifying the Joiners, Aspirers, and Detractors November 2014 Insight Report using our DeepProfile capabilities About this Insight Report Disclaimer: Ello did not participate

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Appendix Figure 1 The Geography of Consumer Bankruptcy

Appendix Figure 1 The Geography of Consumer Bankruptcy Appendix Figure 1 The Geography of Consumer Bankruptcy Number of Bankruptcy Offices in Each Court Number of Chapter 13 Judges Chapter 13 Filings Per Capita Chapter 13 Discharge Rate Variation in Chapter

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Social Media, How To Guide for American Express Merchants

Social Media, How To Guide for American Express Merchants Social Media, How To Guide for American Express Merchants americanexpress.com.au/merchant How to use Social Media successfully for small independent businesses 1 Contents Introduction - Page 3 1. What

More information

Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Common Online Advertising Terms Provided by ZEDO, Inc.

Common Online Advertising Terms Provided by ZEDO, Inc. 3rd Party Ad Tag 3rd Party Redirect Action Action Tracking Tag Activity Ad Dimension Ad Hoc Report Ad Network Ad Tag Advanced Report Advertiser Advertiser Summary Report Advertiser Type Allocate per Ad

More information

A Measurement-driven Analysis of Information Propagation in the Flickr Social Network

A Measurement-driven Analysis of Information Propagation in the Flickr Social Network A Measurement-driven Analysis of Information Propagation in the Flickr Social Network Meeyoung Cha MPI-SWS Campus E1 4 Saarbrücken, Germany mcha@mpi-sws.org Alan Mislove MPI-SWS Campus E1 4 Saarbrücken,

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Socialbakers Analytics User Guide

Socialbakers Analytics User Guide 1 Socialbakers Analytics User Guide Powered by 2 Contents Getting Started Analyzing Facebook Ovierview of metrics Analyzing YouTube Reports and Data Export Social visits KPIs Fans and Fan Growth Analyzing

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Internet Access and Use: Does Cell Phone Interviewing make a difference?

Internet Access and Use: Does Cell Phone Interviewing make a difference? Internet Access and Use: Does Cell Phone Interviewing make a difference? By Evans Witt, Jonathan Best and Lee Rainie A paper prepared for The 2008 Conference of The American Association for Public Opinion

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

An Analysis of Twitter Users vs. Non-Users. An Insight Report Presentation Using DeepProfile Micro-Segmentation January 2014

An Analysis of Twitter Users vs. Non-Users. An Insight Report Presentation Using DeepProfile Micro-Segmentation January 2014 An Analysis of Twitter Users vs. Non-Users An Insight Report Presentation Using DeepProfile Micro-Segmentation January 2014 CivicScience s DeepProfile Project Goal To compare the key demographic, psychographic,

More information

Credit Card Market Study Interim Report: Annex 4 Switching Analysis

Credit Card Market Study Interim Report: Annex 4 Switching Analysis MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

B2B Social Media + Email Marketing: Rock Solid Strategies For Doing It Right!

B2B Social Media + Email Marketing: Rock Solid Strategies For Doing It Right! Brought to you by: ExtremeDigitalMarketing.com B2B Social Media + Email Marketing: Rock Solid Strategies For Doing It Right! Businesses Connecting With Businesses Through The Power Of Social Media! By

More information

Twitter FREE GUIDE Provided by: Unleash Twitter

Twitter FREE GUIDE Provided by: Unleash Twitter Provided by: Copyright Oliver James Enterprise Unleash the power of Twitter FREE GUIDE Unleash Twitter 1 This guide will teach you: Twitter s features and terminology How to formulate a content strategy

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

More information

A Brief History About Social Media

A Brief History About Social Media Table Of Contents Executive Summary...3 A Brief History About Social Media...4 Evaluating Social Media Tools...5 Facebook...5 Twitter...6 MySpace...6 YouTube...7 Blogging...8 LinkedIn...8 Conclusion...10

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Using Social Media to Improve Your SEO

Using Social Media to Improve Your SEO Using Social Media to Improve Your SEO Brent D. Payne SEO & Social Media Director 435 Digital SEO 101: Popularity Popularity Links Authority Trust Popularity Relevance Similar Content Authority Social

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Automatic measurement of Social Media Use

Automatic measurement of Social Media Use Automatic measurement of Social Media Use Iwan Timmer University of Twente P.O. Box 217, 7500AE Enschede The Netherlands i.r.timmer@student.utwente.nl ABSTRACT Today Social Media is not only used for personal

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

More information

AUSTRALIAN MULTI-SCREEN REPORT QUARTER 3 2014

AUSTRALIAN MULTI-SCREEN REPORT QUARTER 3 2014 AUSTRALIAN MULTI-SCREEN REPORT QUARTER 3 TV AND OTHER VIDEO CONTENT ACROSS MULTIPLE SCREENS The edition of The Australian Multi-Screen Report provides the latest estimates of screen technology penetration

More information

IBM SPSS Data Preparation 22

IBM SPSS Data Preparation 22 IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release

More information

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE

INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE AWA offers a wide-ranging yet comprehensive overview into the world of Internet Marketing and Social Networking, examining the most effective methods for utilizing the power of the internet to conduct

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

ONLINE INSIGHTS. Online Video Content & Advertising. Video Preferences, Habits and Actions in Q4 2011. October 2011.

ONLINE INSIGHTS. Online Video Content & Advertising. Video Preferences, Habits and Actions in Q4 2011. October 2011. Online Video Content & Advertising October 2011 Copyright 2011 WWW.BURSTMEDIA.COM Overview: Online Video Content & Advertising Online video has enjoyed exponential growth in 2011. According to comscore,

More information

Digital media glossary

Digital media glossary A Ad banner A graphic message or other media used as an advertisement. Ad impression An ad which is served to a user s browser. Ad impression ratio Click-throughs divided by ad impressions. B Banner A

More information

2011 Cell Phone Consumer Attitudes Study

2011 Cell Phone Consumer Attitudes Study 2011 Cell Phone Consumer Attitudes Study Prepared for: CWTA April 29, 2011 Copyright 2009-2012 Quorus Consulting Group Ltd. Table of Contents Executive Summary 3 Research Objectives and Methodology 9 Detailed

More information

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test. The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

More information

Comparing Recommendations Made by Online Systems and Friends

Comparing Recommendations Made by Online Systems and Friends Comparing Recommendations Made by Online Systems and Friends Rashmi Sinha and Kirsten Swearingen SIMS, University of California Berkeley, CA 94720 {sinha, kirstens}@sims.berkeley.edu Abstract: The quality

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Best Practices of. Marketing

Best Practices of. Marketing Term Paper E Business Technologies Best Practices of Marketing Master in Business Consulting Winter Semester 2010 Submitted to Prof.Dr. Eduard Heindl Prepared by Venkata Ramesh Kumar Gadhamsetty Matriculation

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Figure 1. An embedded chart on a worksheet.

Figure 1. An embedded chart on a worksheet. 8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

MEMBER SOCIAL MEDIA SETUP GUIDEBOOK

MEMBER SOCIAL MEDIA SETUP GUIDEBOOK MEMBER SOCIAL MEDIA SETUP GUIDEBOOK I n t r o d u c t i o n The use of social media to support Have the Talk of a Lifetime SM Social media has become a part of everyone s life and provides a powerful platform

More information

Committee guide to club features on UWESU Website Version 1.1

Committee guide to club features on UWESU Website Version 1.1 Committee guide to club features on UWESU Website Version 1.1 This guide will help you get the best out of your club pages, whether you are a sport, society or network. 1 Logging on and gaining access

More information

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow WHITE PAPER: PREDICTIVE CODING DEFENSIBILITY........................................ Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Who should read this paper Predictive

More information

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

More information

Easy Strategies for using Content (Ctrl) in your Email Marketing Today www.contentctrl.com

Easy Strategies for using Content (Ctrl) in your Email Marketing Today www.contentctrl.com Field Guide to the Social Email (R )Evolution Easy Strategies for using Content (Ctrl) in your Email ing Today www.contentctrl.com Welcome To The Party You ve added a social sharing button to your email

More information

It s On Us Social Media Measurement Plan

It s On Us Social Media Measurement Plan It s On Us Social Media Measurement Plan Background The It s On Us campaign is an initiative developed by the White House, stemming from the creation of the White House Task Force to Protect Students from

More information

Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits

Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits Social Media www.constantcontact.com 1-866-876-8464 Insight provided by 2011 Constant Contact, Inc. 11-2168 What is Twitter?

More information

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Who should read this paper Predictive coding is one of the most promising technologies to reduce the high cost of review by

More information

MEASURING GLOBAL ATTENTION: HOW THE APPINIONS PATENTED ALGORITHMS ARE REVOLUTIONIZING INFLUENCE ANALYTICS

MEASURING GLOBAL ATTENTION: HOW THE APPINIONS PATENTED ALGORITHMS ARE REVOLUTIONIZING INFLUENCE ANALYTICS WHITE PAPER MEASURING GLOBAL ATTENTION: HOW THE APPINIONS PATENTED ALGORITHMS ARE REVOLUTIONIZING INFLUENCE ANALYTICS Overview There are many associations that come to mind when people hear the word, influence.

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

Social Media Glossary of Terms For Small Business Owners

Social Media Glossary of Terms For Small Business Owners Social Media Glossary of Terms For Small Business Owners Introduction As a small business, reaching your audience efficiently and cost-effectively way is critical to your success. Social media platforms

More information

Essential Communication Methods for Today s Public Social Media 101. Presented by Tom D. Trimble, CIO Tulsa County Government

Essential Communication Methods for Today s Public Social Media 101. Presented by Tom D. Trimble, CIO Tulsa County Government Essential Communication Methods for Today s Public Social Media 101 Presented by Tom D. Trimble, CIO Tulsa County Government What you will learn today Why Social Media is important to you Today s Social

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by 1 Measure Social Media like a Pro: Social Media Analytics Uncovered # SOCIAL MEDIA LIKE # SHARE Powered by 2 Social media analytics were a big deal in 2013, but this year they are set to be even more crucial.

More information

Data Visualization Handbook

Data Visualization Handbook SAP Lumira Data Visualization Handbook www.saplumira.com 1 Table of Content 3 Introduction 20 Ranking 4 Know Your Purpose 23 Part-to-Whole 5 Know Your Data 25 Distribution 9 Crafting Your Message 29 Correlation

More information

Outlook special Behavioural shifts in target audiences

Outlook special Behavioural shifts in target audiences Outlook special Behavioural shifts in target audiences September Where can you reach your customer in? Behavioural shifts in target audiences The future media consumption of five generations explored Introduction

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data.

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data. 1 of 9 12/8/2014 12:57 PM (an On-Line Internet based application) Instructions: Please fill out the information into the form below. Once you have entered your data below, you may select the types of analysis

More information

Web Search Engines: Solutions

Web Search Engines: Solutions Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

More information

Media Channel Effectiveness and Trust

Media Channel Effectiveness and Trust Media Channel Effectiveness and Trust Edward Paul Johnson 1, Dan Williams 1 1 Western Wats, 701 E. Timpanogos Parkway, Orem, UT, 84097 Abstract The advent of social media creates an alternative channel

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information