Progress Report. Social Media Analytics



Similar documents
Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Content-Based Discovery of Twitter Influencers

Capturing Meaningful Competitive Intelligence from the Social Media Movement

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo

SOCIAL LISTENING AND KPI MEASUREMENT Key Tips for Brands to Drive Their Social Media Performance

Small Business Guide to Monitoring your Online Reputation

SOCIAL MEDIA DID YOU KNOW: WHAT IS SOCIAL MEDIA? IGNORE IT AT YOUR PERIL! ANYWHERE GETTING GREYER

the beginner s guide to SOCIAL MEDIA METRICS

Amp Up Your Marketing with Social Media

{ { Calculating Your Social Media Marketing Return on Investment. A How-To Guide for New Social Media Marketers. Peter Ghali - Senior Product Manager

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

How To Be A Popular Twitter User

7 Biggest Mistakes in Web Design 1

DIGITAL COMMUNICATIONS: SOCIAL MEDIA FOR B2B BUSINESS

smart. uncommon. ideas.

Tweeting Educational Technology: A Tale of Professional Community of Practice

The objective setting phase will then help you define other aspects of the project including:

Sentiment Analysis on Big Data

Pulsar TRAC. Big Social Data for Research. Made by Face

STATE OF B2B SOCIAL MEDIA MARKETING 2015

Using Twitter for Business

It s On Us Social Media Measurement Plan

The Impact of Social Networking to Influence Marketing through Product Reviews

Measuring User Influence in Twitter: The Million Follower Fallacy

5 Point Social Media Action Plan.

2 nd Annual Social Media Study

Bigfork Present: Planning for Relevant Traffic

Measuring your Social Media Efforts

Online Reputation Management Services

Social Media and Content Marketing.

Quick Guide to Getting Started: Twitter for Small Businesses and Nonprofits

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES

Digital TV switchover: Social media

Presented by: Pikr.co.za Contact us: Visit us:

6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED

Social Media Monitoring - A Glossary of Terms

Social Media Marketing for Small Business Demystified

Marketing Guide for Authors

Social Media Tips & Tools for Customer Engagement and Growth. Jessica Wilkins Byerly PIP Printing and Marketing Services Burlington, NC

River Pools and Spas: Blogging for Sustainable Business Growth

Social Media, How To Guide for American Express Merchants

Automatic measurement of Social Media Use

Navigating the Web: Are You Missing The Boat?

INTRODUCTION TO THE WEB

SOCIAL MEDIA SUCCESS IN 14 STEPS

BUDGET FRIENDLY MARKETING FOR STARTUPS EMPIRICAL STUDY OF SEO SUCCESS FACTORS

6 Tips for Reaching Boomers & Seniors with Digital Marketing

How to Use Social Media to Enhance Your Web Presence USING SOCIAL MEDIA FOR BUSINESS.

HOW TO PROMOTE YOUR SMALL BUSINESS ONLINE

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Exploring Big Data in Social Networks

How Social Media will Change the Future of Banking Services

Creating a Digital Marketing Strategy

How to Build Online Brand Authority

Social Media. Marketing Guide B2B

How to Define and Prioritize Your Stakeholders and Determine What Influences Them

Social Media and Content Marketing. A Guide for B2B Marketing Managers

Social Media- tips for use and development Useful tips & things to avoid when using social media to promote a Charity.

SEO Guide for Front Page Ranking

Higher Education in Further Education Webinar

Hiring Position Recommendation: AdMind Technology

WHITE PAPER. Social media analytics in the insurance industry

Social Media Strategy:

Social Media Strategy

Your Guide to Building Relationships with Customers, Prospects and the Media Using the World s Fastest Growing Social Networking Site

Websites & Social Media. in the Professional Environment. A practical guide to navigating the world of social media

Social Media Strategy Wheel

SOCIAL JOURNALISM STUDY 2012

Ad Film Making Services FAQ s

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE

Social Currency: The Raconteur s Investment Portfolio

ONLINE REPUTATION MANAGEMENT

A short guide to Twitter

The Public Sector Guide to Social Media Strategy and Policy

WHITE PAPER Online Marketing Fail #32. Failing to Properly Optimize Social Media. Social Media: A Tangled Web

boost Your Business with Social Media

OVERVIEW OF INTERNET MARKETING

Social Listening & Analytics:

Using Twitter to Increase Awareness and Participation. Justin Ramers Director of Social Media

Web Success For Your Business

Marketing. Marketing Your Business. The Ground Work

Credit Card Market Study Interim Report: Annex 4 Switching Analysis

Marketing Success

Your Social Media Starter Kit For Content Marketing

TWITTER 201 STRATEGIES AND BEST PRACTICES FOR USING TWITTER. Twitter 201 for Child Welfare Professionals:

Get to Grips with SEO. Find out what really matters, what to do yourself and where you need professional help

WHITE PAPER Closing the Loop on Social Leads. A Hootsuite & 2DiALOG HubSpot White Paper

Automated Text Analytics. Testing Manual Processing against Automated Listening

How to Optimize Your Web Presence for Lead Generation

A Guide to Promoting your Project

Social Media for Business Benefit: The emergence and impact of social media on customer interaction

Online Marketing Channels

Professional Diploma. in Digital Marketing

Last Updated: 08/27/2013. Measuring Social Media for Social Change A Guide for Search for Common Ground

Digital marketing strategy

Social Media Monitoring visualisation: What do we have to look for?

The Benefits of Online Ratings and Reviews for E-commerce Merchants

A Brief History About Social Media

Transcription:

COMP60990 Research Methods and Professional Skills Progress Report Social Media Analytics 8 th May 2015 MARCUS KIN ING LEUK M.Sc. Advanced Computer Science and IT Management Supervised by: Dr. Ilias Petrounias

Abstract Online social media is an emerging form of communication and interaction between people. It allows people to publish brief messages and share information with others using various platforms such as Facebook and Twitter. This also leads to the increase in importance of online social influence. Social influence is the ability to affect one s thoughts, emotions, or behaviours. Individuals in the online community with this ability are called social media influencers and they are often used to spread opinions and market products. The main objective of the project is to identify topic-specific influential members in the Twitter context. Our study reports important findings, discusses various approaches and addresses the problems faced by other researchers while identifying influencers. We also proposed a framework that consists of a list of criteria to identify influencers. This set of criteria was defined by looking at influencers in the traditional and online social network context. Our work starts by first identifying a topic on Twitter and collecting a dataset of Twitter user accounts related to that topic. After that, we will select personal accounts only for further analysis. Next, we will apply our framework to rank the users in terms of their influence on a specific topic. Finally, we will evaluate our framework by applying different approaches proposed by other researchers to the same data we collected and compare the resulting influencers.!1

Table of contents Abstract 1 Table of contents 2 1. Introduction 3 1.1 Aims and objectives 3 1.2 Report layout 3 2. Background 4 2.1 Measuring influence on Twitter 4 2.2 Motivation 5 3. Literature Review 6 3.1 Criteria of social media influencers 6 3.2 Approaches for identifying influencers 10 3.3 Outcome 13 3.4 Concluding remarks 14 3.5 Importance of our work 15 4. Project Progress 17 4.1 Data collection 17 4.2 Classification of Twitter user accounts 18 4.3 Proposed framework 19 4.4 Evaluation 23 5. Project Plan 25 6. Conclusion 27 References 28!2

1. Introduction The creation of Web 2.0, which emphasise on user-generated contents, usability and interoperability have turned former online information readers into information producers. The main feature of Web 2.0 is the ability to allow users to interact with one another in the social media network (o'reilly 2009). This led to the creation of many social networking sites such as Facebook, Twitter, Google+ and LinkedIn. Other than providing the opportunity for millions of people to communicate, these sites led to the creation of online communities, which encourages people to share information and exchange opinions on common topics. Nowadays, it is common practice for people to share their thoughts and also read the opinions of others on a wide range of topics, which includes everything from reviewing products, discussing about politics, to expressing personal emotions. 1.1 Aims and objectives The main purpose of the project is to develop a suitable approach to identify topic-specific influential users on the Twitter network. In order to do this, a set of criteria of an influencer has to be identified, which will be used to develop a theoretical framework. Then, a manageable dataset of user accounts that are related to a specific topic on Twitter will be collected and use to evaluate the proposed framework. Final results should identify users that are influential on a specific topic on Twitter. 1.2 Report layout The remaining part of this report is structured as follows: Chapter 2 covers essential background knowledge specific to the project and the motivation for undertaking the project. Chapter 3 reviews multiple approaches for identifying social media influencers covered in the!3

literature and also describes the importance of our work. Chapter 4 describes the whole process of the project, including the initial proposed framework and the evaluation plan. Whereas, Chapter 5 presents the future plans to complete the project. Finally, Chapter 6 concludes the report. 2. Background This chapter covers briefly on the background knowledge specific to the project, including the reason for choosing Twitter as a research platform and also the motivation for undertaking the project. Details about related work done in the field will be discussed in Chapter 3. 2.1 Measuring influence on Twitter As of March 2015, 70% of Internet users have active social media accounts. Studies showed that Facebook and Twitter have a monthly estimate of 1415 million and 288 million active user accounts respectively, which make them two of the most popular social media platforms (Kemp 2015). Twitter is a micro-blogging service that allows users to publish brief messages(tweets) and also include links to other websites in the message. Its most attractive feature is the 140 character limit for each tweet, which encourages users to post tweets that are not too lengthy by capturing the important bits of a topic. Unlike other social networking sites like Facebook, where both parties have to agree in order to become friends, Twitter allows the concept of following where users are allowed to follow anyone they want without needing the other party to follow them back(weng, Lim et al. 2010). These features make Twitter an excellent marketing platform for marketers to launch effective marketing campaigns, by targeting influencers. Hence, Twitter was chosen as the ideal research platform for the project.!4

2.2 Motivation Word-of-Mouth(WOM) is an informal communication behaviour, where consumers exchange experiences about specific products and services among each other (Westbrook 1987). In the online community, WOM is also known as viral marketing. Since social media are used extensively nowadays, and WOM has such a great impact in the purchasing decision-making context, marketers could use viral marketing methods to spread knowledge about their brand, products or services across customers with a low marketing cost. However, customers can be overwhelmed by the massive amount of reviews and opinions generated by the online communities. It is their choice to choose which opinions that they think are trustworthy and so, it is important for marketers to accurately identify suitable opinion leaders/influencers for the company. The Oxford dictionary defines influence as the capacity to have an effect on the character, development, or the behaviour of someone or something. Feng et al. (2011) described social influence as the power possess by a person to have an effect on the thoughts or actions of others. According to the research done by Wu et al. (2010), more than half the contents on Twitter were contributed by only 0.05% of the Twitter population. This group of people are online influencers, where most of them are celebrities, politicians and the news media. It was found that most information was produced by the news media, whereas celebrities and politicians had the most followers. On the other hand, most Twitter users have less influence as they often only read and share information created by the influencers. Influencers are very important to businesses because they are able to use WOM to cause a chain-reaction, in which information can be spread quickly across a wide audience. Like viruses, this marketing strategy takes advantage of rapid multiplication, to spread the marketing message to!5

millions of people (Wilson 2000). By identifying and persuading these influencers, companies are able to market their products to more people in a shorter time, with minimal marketing cost. As mentioned by Domingos (2005), in traditional marketing, customers will receive offers only when the expected profit is more than the cost of the offer. Whereas in viral marketing, offering products to influencers for free could benefit many times in sales to other customers. Therefore, it is important for companies to realise how online influencers can be essential assets to their business. 3. Literature Review This chapter describes the related work done by others, which includes the comparison and discussion of existing techniques for identifying influencers. We identified areas that were overlooked by others and also described the importance of our work. The details about our approach will be discussed further in Chapter 4. 3.1 Criteria of social media influencers To search for influencers, it is important to start by identifying a set of criteria that makes a person an influencer. Keller and Berry (2003), Akritidis et al. (2011) and Agarwal et al. (2008) had similar views on some of the characteristics that influencers should have, which were summarised below: Activity generation - An influencer s post should be able to generate activities by initiating discussions among others. This can be measured by looking at the number of comments that the post receives. If a post received a large amount of comments, it means that many people spend time reading and exchanging thoughts about that post, which indicates that it may have significant influence to the people (Agarwal, Liu et al. 2008).!6

Recognition - An influencer s post should be recognised by many. This can be measured indirectly by looking at the number of inlinks(post referenced in other posts). A high number of inlinks suggests that the post is highly recognisable by most people. If the referring posts are highly influential, the referred post will become even more influential (Agarwal et al., 2008). Eloquence - Influencers are usually expressive and persuasive. They are the people others look up to for advice. They believe that WOM is more important than traditional media. Besides, they are not afraid of sharing their opinions on what they like or dislike (Berry 2003). Normally, these traits can be seen from the quality of the user s post. The quality of a post can be measured in various ways, such as vocabulary usage, fluency, and content analysis. However, these properties are difficult to analyse due to the informal nature of most social networks. Instead, Akritidis et al. (2011) and Agarwal et al. (2008) had used the length of post to determine whether a post is influential or not. They believed that users have no reason to write lengthy posts to bore the readers. Thus, a lengthy post often indicates some necessity to do so. Novelty - As suggested by Keller and Berry (2003), innovative and unique posts will exert more influence. Influencers should always be optimistic and have a high tendency of accepting new things. Besides, influencers are often trendsetters because they normally acquire information before others and they like to share the information with their followers. To determine whether a post is novel, we can look at the number of outlinks(the other posts that it is referring to). Fewer number of outlinks indicates that the post refers to none or very few other posts or articles, which means that it is more likely to be novel. The number of comments is also correlated with the number of outlinks, where more novel posts will attract the attention of more people (Agarwal, Liu et al. 2008). The four criteria stated above are basic properties of an influencer s post. Zhou et al. (2009) suggested some other criteria, which include the activeness of the user and the date they joined!7

social media. It was believed that users that are very active and have been on social media for a long time are more likely to be considered as influencers. On the contrary, findings by Agarwal et al. (2008) conflicted with those of Zhou et al. (2009), who divided social media users into four groups: influential and active, influential but inactive, not influential but active, and not influential and inactive. Based on their findings, active social media users are not necessarily influential and influential users may be inactive (Agarwal, Liu et al. 2008). However, despite user activeness and the duration they have joined social media, Akritidis et al. (2011) claimed that any user could raise to become an influencer if they recently had several influential posts, which have had an impact to the online community. The latter claim was also backed up by Agarwal et al. (2008), in which they categorised influencers into four groups based on their different temporal patterns: Long-term influencers users that are able to maintain their influential status for a very long time. Average-term influencers users that are influential for 4-5 months. Transient influencers users that can maintain their influential status for only 1-2 months. Burgeoning influencers users who have recently emerged as an influencer, which might become any of the three types of influencers stated above in the future. In fact, the duration of influence indirectly indicates the user s reputation and knowledge. For instance, long-term influencers are often users that will gain the highest level of trust from others and they may also be viewed as experts in that particular field. Therefore, long-term influencers are considered the most influential among others.!8

Alternatively, users can be influencers if they are well-connected with other users. This can be measured by the degree of centrality. Users who possess a high outdegree of centrality (the number of direct connections the user has to others) is expected to be influencers. This is because they have many connections, which makes them stand out from others. It is beneficial for users to have many connections because they are able to access more resources and find alternative ways to satisfy their needs, which also makes them less dependent on others (Hanneman and Riddle 2005). Ya-ting and Jing-min (2011) divided centrality into three categories: degree centrality, betweenness centrality and closeness centrality. Degree centrality measures all the direct connections of the user, which indicates the user s ability to interact with others. It is important to note that the user s direct connections alone will not determine the level of influence. Betweenness centrality is the measure of how strategic the user s position is in the network, i.e., the user is in a good position if he is on the shortest path between others. This indicates how well the user is able to control resources. Lastly, closeness centrality indicates how independent a user is to others. If a user is less dependent on others, he has a higher degree of centrality because other users often depend on him. In addition, empirical evidence gathered by researchers has suggested that most influencers exhibit specific behaviours and that their influential status was not gained accidentally. Quercia et al. (2011) mentioned that a user s level of influence will be determined by the level of user involvement and audience engagement. Cha et al. (2010) also added that influence can be gained through concerted effort, whereas personal involvement is crucial for maintaining influence. Besides, they concluded that anyone can gain influence by focusing on a single topic, as well as publishing creative, unique and insightful posts. In social media, the use of language in one s post is closely linked to social influence. Influencers must have good communication skills, in which their choice of words should always be!9

persuasive, in order to convince their followers. Quercia et al. (2011) found that influencers often structure their tweets in a similar linguistic manner. Most of them will also include negative sentiments as part of their posts. Furthermore, linguistic qualities can also reflect the user s emotions and personality. Thus, it is important to look into linguistic features while identifying influencers. 3.2 Approaches for identifying influencers Over the past few years, various tools and techniques have been created for detecting influencers. Some examples of widely used tools are Simply Measured, Twtrland, Followerwonk and Klout. What makes these tools popular are their stunning graphical illustration of the data and their user friendly interface. Besides, most of them are free and readily available. Despite having a few benefits, the results they produced are still not convincing enough. This is because their analysis methods are often based on quantitative measures alone, such as the number of tweets posted, the number of followers and the number of retweets. Features such as quality of posts and linguistic structures are not taken into consideration. To prove that the number of retweets were not sufficient in determining influence, Hubspot conducted research on more than 2.7 million tweets that contained a link to another website. Results showed that most users retweet without even clicking on the link to look at the content they were retweeting, which means that information are often being passed on blindly (Bennett 2012). The frequency of posts was also not reliable in determining influence because nowadays, there are numerous tools that allow tweets to be posted to Twitter automatically at a preset frequency.!10

Besides, the French Huffington Post had conducted an experiment to prove that organisations and individuals may also buy fake followers to boost their popularity and gain trust from the customers. In the experiment, they have bought more than 50,000 followers with the budget of only 33 euros (Provost 2012). This proves how simple and cheap it is to buy followers. Furthermore, it is common to assume that the level of impact of a user is determined by the number of followers he has. The assumption is true only if all tweets published are read by all the followers. Researches have also proved that there is a weak correlation between popularity and influence. In order for information to spread across a network, it is important for an individual to have followers who are active in forwarding information to others, rather than passive readers who does nothing (Romero, Galuba et al. 2011). As a result, many marketing professionals will still favour the manual approach for identifying influencers instead of relying on tools, even though it may be time consuming. To accurately identify influencers without relying on the tools described above, various techniques have been carried out on raw data collected using the Twitter Application Program Interface(API). Cha et al. (2010) made an in-depth comparison between the three activities that represent the different types of influence of a person: Indegree influence measured by the number of followers of a user. The amount of followers determines the size of the audience for that user. Retweet influence determined by collecting the number of retweets that contained one s name. The amount of retweets indicates the ability for the user to create quality posts that others think are worth sharing. Mention influence measured by looking at the number of mentions that contained one s name. This indicates how well can the user engage with others in a conversation.!11

Many other different approaches have been used by researchers to identify influencers. Kwak et al. (2010) carried out similar experiments where they ranked influencers based on PageRank, the number of followers(indegree) and the number of retweets. The PageRank algorithm was first used by Google to rank their web pages in their search results (Page, Brin et al. 1999). PageRank was used to measure influence because the influence of a user is similar to the concept of authority of a web page: a Twitterer has high influence if the sum of influence of his followers is high; at the same time, his influence on each follower is determined by the relative amount of content the follower received from him (Weng, Lim et al. 2010). Weng et al. (2010) criticised that PageRank ignores the topical interest of Twitter users, which would affect how they influence others. They believed that most users may not read tweets with topics that do not interest them and so, influence will vary according to different topics. Instead, they have proposed an approach that takes into account topical similarity and link structure among Twitter users, which they called TwitterRank. The TwitterRank algorithm starts by generating a directed graph, which shows the following relationship of the users and use a random surfer to visit each connected user based on a transition probability. The transition probability is calculated by looking at the common topical interest between two users. The more topical interest shared between the two users, the higher the transition probability. By repeating the process, a topic-specific relationship between the users can be constructed. Furthermore, the report filed by Alex Leavitt et al. (2009) suggested that using the number of followers alone to determine influence is acceptable only if Twitter is a normal broadcast medium and we ignore the fact that Twitter users are able to interact with the content on the platform. Therefore, they proposed a way that better explains how influence occurs in the Twitter network, which is by calculating the ratio of followers to followees.!12

3.3 Outcome Research conducted by Weng et al. (2010) showed that 72.4% of Twitter users follow more than 80% of their followers, and 80.5% of Twitter users have 80% of their friends follow them back. Based on the results, it is clear that reciprocity exists in the Twitter context and they suggested two reasons to explain such reciprocity. Firstly, it might be casual following because it is so easy to follow someone on Twitter and some users follow others simply for etiquette. On the other hand, it might be homophily, where Twitter users follow each other because they have similar topical interests. McPherson et al. (2001) explained this phenomenon as a principle, in which the influence and connection between similar people (i.e., similar culture, background, characteristics or interests) occurs at a higher rate than among dissimilar people, and it is present in many social networks. If most Twitter users are following others based on the first reason, then it is not reliable to use the number of followers to determine influence. However, their research has shown that homophily does exist in the Twitter context, which means that some users choose who to follow seriously and they only follow people with similar topical interest. Besides, Cha et al. (2010) found that most followed users were politicians, celebrities and the news sources. Therefore, they suggested that these were the users to look for if a lot of attention is needed from a wide audience. On the other hand, retweets represent influence beyond the one-toone interaction between users. They found that most retweeted users were businessmen, the news sources and the content aggregation services. They also suggested that retweeting is a powerful method to emphasise a message. The probability for people to accept new ideas will be higher as the number of people who repeats the same message increases (Watts and Dodds 2007). Lastly, users that were mentioned the most were often celebrities. This is because celebrities have many!13

fans and celebrity gossip is always a popular topic on Twitter, therefore, they are considered the centre of public attention(cha, Haddadi et al. 2010). Cha et al. (2010) also found that there was a strong correlation between retweet influence and mention influence. This means that, users who get retweeted frequently are also often mentioned, or vice versa. On the other hand, indegree was not related to the other measures, which explains why users with a lot of followers do not make them influencers that are good at engaging in conversations and spreading information. Besides, most influencers are influential over a variety of topics, which meant that they are popular opinion leaders that can be relied on to spread information, even if they are not experts in certain areas. It is also more effective to target top influencers in social media to start a viral marketing campaign, instead of employing a large number of non-popular users (Cha, Haddadi et al. 2010). Kwak et al. (2010) reported that PageRank and indegree had ranked the influencers in a similar manner, whereas influencers ranked by the number of retweets were different. As mentioned in Chapter 3.2, PageRank was developed with ranking web pages as its main purpose. Furthermore, experimental results proved that TwitterRank outperformed PageRank in identifying influencers (Weng, Lim et al. 2010). 3.4 Concluding remarks Relying on the number of followers to identify influencers is a common misconception by many marketers. In fact, this approach reveals only the popularity of a person. Besides, research!14

found that followers can be easily bought with low cost. Therefore, the number of followers should only be taken as a contributing factor to a person s degree of influence. Many features of influencers have been identified, which boils down to four main criteria: activity generation, recognition, eloquence and novelty. Other criteria such as the activeness and the duration the user joined social media were suggested. As a matter of fact, it is not the case that influential users are always active users, and vice versa. Besides, the time span for successfully maintaining an influential status helps others determine the user s trustworthiness. Furthermore, influencers should have a reasonable amount of user involvement, frequent audience engagement, and they should be well-connected with others. Linguistic features should also be taken into consideration as it reveals the user s emotions and personality. In short, there are certainly other potential criteria of an influencer, besides the ones described above. It is clear that each criterion on its own is insufficient to identify influencers accurately. Therefore, they will be used jointly for the project. 3.5 Importance of our work Most approaches developed by others focused on finding influencers based on obvious data that they were able to collect from Twitter, such as the number of tweets, retweets, indegree and mentions. For example, Cha et al. (2010) used only the number of retweets and mentions to determine influencers, whereas others used algorithms such as PageRank and TwitterRank. These algorithms were complex and were developed with ranking web pages in mind. We think that the characteristics of influencers are far more complicated, in which most algorithms do not take these characteristics into account. A common mistake made by most researchers was to identify!15

influencers based on separate criteria. In addition, while identifying the list of criteria, most researchers limit their thoughts to what data can Twitter provide, hence, overlooking some important aspects. Instead, we looked at influence in a traditional context, assuming that the Internet does not exist and focused on what makes an individual influential. It was found that people are easily influenced by family members compared to strangers, and the main reason is always about trust. For example, most children will go to their parents for advice and do what is asked by their parents without hesitation. This is because they know that their parents will not harm them. This phenomenon also applies to adults, even though it is not obvious. A recent example is the huge fashion influence by Kate Middleton, i.e., when she wears a dress, consumers will go to stores to buy the same dress. It is suggested that influencers are trusted by others because of their experience and knowledge on a specific matter. In traditional social network, Tedeschi et al. (1972) suggested that an influencer should possess the characteristics of a leader, which includes self-confidence, self-control and the need for achievement. The need for power was also found to be related to influence (Mowday 1978). By using this approach, we came out with a list of criteria for influencers in the traditional social network context. We also looked at influencers in online social media. It was found that there are some differences and potential features that could be added to the list of criteria. For example, user activeness is important because on Twitter, there are millions of followers and followees that do not physically meet and talk to each other. So, influencers need to be active in sharing their opinions to others to gain recognition for themselves.!16

In short, other related work looked at influencers from the online social media context only. Our work is different because we combined the insights from both the traditional and online social networks, which led to an expanded set of criteria to identify influencers. These criteria will form the basis of our work, which is shown in our proposed framework in Chapter 4.3. 4. Project Progress This chapter describes the progress of the project so far and also the overall process of identifying influencers. First, a dataset of user accounts related to a specific topic have to be collected from Twitter. Then, the accounts have to be filtered and classified. After that, users will be ranked according to their degree of influence, based on the techniques proposed in our framework. Finally, the result will be evaluated to ensure that the influencers are identified correctly. 4.1 Data collection Twitter has a vast 302 million monthly active user accounts and an average of 500 million Tweets are posted per day. In order to identify influencers for a specific topic, we need to first determine a fairly recent topic on Twitter and retrieve a manageable dataset of users that actively discuss about that topic. These data can be collected using the REST API provided by Twitter. The API allows developers to read and write Twitter data, which includes reading user profiles and follower data. By searching the keywords of a specific topic, will enable us to retrieve a collection of relevant tweets that is related to that topic. Besides, other information about the users will be retrieved together with the tweets. Table 1 shows a sample of the data collected using the API, with some brief explanation of a few important fields.!17

Sample Field "created_at":"mon Jan 26 22:17:07 +0000 2015" Description The time when the tweet was created. id:559837463074447360 text:"extend your battery life and protect your iphone 6 while saving 64% [Deals] http:\/\/t.co\/ CsHXisph54 #iphone" in_reply_to_status_id:559806614761664512 name:"kara JAV Cams" location:"chester followers_count:1947 friends_count:1081 listed_count:11 favourites_count:3293 statuses_count:64613 lang:"en" The unique identifier for this tweet. The content of this tweet. This indicates that the tweet is a reply to another tweet. The ID represents the original tweet s ID. This field can be null if the tweet is not a reply. The user s name defined by the user. This is not necessarily the user s actual name. The location of the account defined by the user. This field can be null. The number of followers that the user currently has. The number of people that the user is following. The number of public lists that the user is in. The number of tweets posted by the user that was marked as favourite by others. The number of tweets posted by the user. The user interface language set by the user. It does not necessarily represent the language in which the tweet was posted. Table 1: Sample data collected from Twitter with description of specific fields 4.2 Classification of Twitter user accounts After collecting data from Twitter, the user accounts have to be filtered so that the amount of data for further analysis is manageable and this could also improve the accuracy of the result. There is a diverse range of Twitter accounts because different users create accounts for different purposes. Generally, Twitter accounts can be classified into three categories: personal accounts, supervised non-personal accounts and bot accounts. Personal accounts belong to individual users, where they have the ability to post anything without being filtered or controlled by anyone else. These accounts often exhibit a wide range of different behaviours and they are the only type of account that would!18

be useful for our project. On the other hand, non-personal accounts belong to organisations or a group of people with common interests. The activities of these accounts are often heavily controlled and their tweets often express the opinion of the group as a whole, instead of personal views. Therefore, they should be excluded for further analysis. Finally, bot accounts are computer programs that are programmed to automatically generate tweets. These accounts could be spam or fake, which would affect the accuracy of our result and so, they should also be eliminated. 4.3 Proposed framework After collecting and filtering the data, a list of criteria has to be determined to help identify influencers. The criteria described below represent some of the properties that a user should possess in order to become an influencer, together with the methods for measuring the criteria. It is also important to note that not all criteria can be implemented as automatic procedures, since analysis tools does not exist for all of them. Activeness Influencers are expected to actively express opinions and their interests with others. They must also have a strong motivation to share information with their followers. This criterion is compulsory for influencers because without user activity, it is difficult to pass on information to others, which is the first step of influencing other people. The activeness of a user can be measured by looking at the number of tweets and retweets published by the user. Experience The experience of a user is closely related to the user s trustworthiness. People normally think that the more experienced an individual is, the more they trust them. For example, children often trust the advice given by their parents in certain situations because they think that their parents might have gone through similar situations in the past given their older age. In order to measure experience, the date that a topic first emerge in the public is compared with the date that the user first posted about that topic. The longer the duration, the more experienced the user.!19

Knowledge & Reputation It is justifiable to say that influencers are often experts in the field they discussed the most. Besides, the amount of knowledge possesses by the user is correlated with the online reputation of the user. For example, a user with superior knowledge on a specific brand and its products will gain a higher trust from others, who are seeking for advice for a product before purchasing it. According to Agarwal et al. (2008), the temporal pattern of the user will determine the level of knowledge and reputation of that user. They claimed that users that are able to maintain a long-term influence may gain higher trust from others. Besides, these users are not just experts on a particular topic, but rather on topics of a similar theme. For example, users that know a lot about the iphones 6 are often experts on other Apple products as well. Activity generation Influencers should have the ability to initiate discussion among others. This can be measured by calculating the average number of tweets posted by the user per day and also the average number of comments that the user gets for each tweet. The average number of tweets posted per day indicates the user s activity, which can be calculated as follows: Average activity rate = Number of tweets posted Age of account (1) Furthermore, if a tweet received a large number of comments, it means that many people spend time reading and exchanging thoughts about it, which indicates the level of influence that the tweet holds (Agarwal, Liu et al. 2008). Activity rate + account age According to Zhou et al. (2009), users that are very active and have been on social media for a long time are more likely to be considered as influencers. Therefore, the combined measure of the average activity rate (1) and the account age can help us determine influencers. Activity rate and account age = 0.5 * Normalised average activity rate + 0.5 * Normalised account age (2)!20

Recognition The number of followers gives an estimate of the size of the audience. If the user has very few followers, his posts would have less impact on others. Besides, the average frequency of a user s tweet being retweeted indicates the rate at which the message is spread across Twitter. For example, if a three day old tweet and a thirty day old tweet both have the same number of retweets, the former tweet is said to gain more recognition. Average retweet frequency = Average number of retweets Average age of the tweet (3) Recognition can also be measured indirectly by looking at the number of inlinks(tweets referenced in other tweets). A high number of inlinks suggests that the tweet is highly recognisable by most people (Agarwal, Liu et al. 2008). So, by combining the number of followers, the average retweet frequency (3) and the number of inlinks, allow us to measure the level of recognition of the user. Twitter Follower-Followee(TFF) Ratio This calculates the ratio of a user s followers to their followees (the people who the user follows), which categorise the users into different types. If the ratio is close to 1 (nearly the same amount of followers and followees), it means that the user might be a listener or simply seeking for knowledge. However, if the ratio approaches infinity, the user is probably very motivated to share information with others and is very confident in what he posted. Finally, if the ratio is close to 0 (low number of follower, but high number of followees), that user can be treated as a spammer (Alex Leavitt 2009). TFF Ratio = Number of followers Number of followees (4) Quality of the tweets posted Users who frequently post nonsense on Twitter will not be considered an influencer. This criterion has to be measured in combination of two other criteria, which are the average number of retweets and the average favourite count. Retweeting is an!21

action of broadcasting tweets from others to your followers, which can also be seen as reinforcing the message. So, if someone retweets, it means that they find the tweet worth sharing with others. Therefore, higher number of retweets indicates higher quality of the tweet. Next, the average favourite count measures how many times a user s tweet is marked as favourite by others. This method is often used by Twitter users to express their acknowledgment and acceptance of the message, which could also indicate the quality of the tweet. Novelty Influencers must be innovative whereby they have a high tendency to accept new things and also share new ideas with others. This can be measured by comparing the date that a topic first rise in popularity and the date that the user first discussed about the topic. In order to do this, a fairly current topic has to be identified from news sources and the date it first appeared have to be recorded. If the date of the user s first post about the topic is very near to the date it appeared in the news, then it indicates that the user might be one of the first users to tweet about that topic. To further increase the accuracy of determining the novelty of a post, it is important to check whether the user s tweets consist of references (e.g., retweets) to other tweets or links to other websites. Centrality The degree of centrality indicates how well the user is connected with others. Centrality analysis can be used to find important members in the Twitter network. The degree of centrality tells whether an individual plays a central role in the network. Users who possess a high outdegree of centrality (the number of direct connections the user has to others) is expected to be influencers, because outdegree represents the ability of the user to socialise with others. Hanneman et al. (2005) suggested that these users are able to acquire resources easily and therefore, they are less dependent on others. In addition, the centrality of a user can be divided into three categories: a) Degree centrality measures all the direct connections of the user, which indicates the user s ability to interact with others.!22

b) Betweenness centrality measures how strategic the user s position is in the network, i.e., the user is in a good position if he is on the shortest path between others. This indicates how well the user is able to control resources. c) Closeness centrality indicates how independent a user is to others. If a user is less dependent on others, he has a higher degree of centrality because other users often depend on him. All criteria described above will be used jointly to identify topic-specific influencers. However, in reality, not all criteria will be treated as equally important. In fact, the weight of each criterion will depend on the application (the reason for identifying influencers). If the weights changes frequently for different applications, it is not up to us to decide on the weights. Therefore, our project focuses on a general approach, in which we do not take into consideration different weights of criteria. We leave the assignment of weights to those who use our framework to identify influencers for various applications. 4.4 Evaluation In order to determine the reliability of the proposed framework, it will be evaluated against a dataset of Twitter user accounts. The proposed framework will be applied to the data to identify a set of relevant topic-specific influencers. In order to compare the performance of our framework, various approaches proposed by other researchers will be applied to the same data to determine their outcome. The influencers determined by each approach will be compared with our result. Since most approaches were based on quantitative data and algorithms, we assume that the final results will be different. This is because our approach takes into account qualitative measures!23

such as analysing the context of each tweet and combining the criteria from both the traditional and online social network context. For instance, when Apple introduced the new Apple Watch, many Twitter users shared their thoughts about what they like or dislike about the product by mentioning Apple s Twitter account in their tweets. They mentioned Apple s account instead of just Apple because they wish that Apple would notice their tweets and respond to them. As a result, the PageRank algorithm would consider the Apple account as an influencer, which should not be the case since Apple did not do anything. Therefore, it is necessary to look at more than one criterion to identify influencers and it is also important to look at the content of the tweets.!24

! 5. Project Plan Keys: Important dates Exam period Table 2: Project plan Gantt chart The literature review will take up most of the project time because various research papers have to be read in order to know what have been done by other researchers in the field and also enable us to generate ideas for designing our framework. The framework design phase was also given a longer period because the proposed framework is still in the early stages, in which it will continue to evolve as more insights are gained from brainstorming and other research papers. Next, we have to evaluate our proposed framework by identifying a current topic on Twitter and a manageable dataset of Twitter accounts. The progress for these phases were delayed and the expected finish time was shifted. This is because problems were encountered during data collection using the API, due to the recent changes in the authentication method made by Twitter. The grey!25

area in table 2 represents the examination period, where more attention has to be put into revision for other modules. Once the exam period is over, we will start gathering a dataset of user accounts for the chosen Twitter topic. At the same time, we have to monitor the life-cycle of the chosen topic and use this data together with the user accounts to gain insights on influential users. Both of these phases were expected to be completed in at least seven weeks. These phases were given a long period because extra work have to be done to filter the data, such as differentiating personal accounts from non-personal accounts. Besides, it is important to eliminate irrelevant data carefully so that we will have a manageable dataset for the evaluation of the framework. The evaluation stage will involve testing of the data collected against the list of criteria described in the proposed framework to identify a set of influencers. We will also compare our framework to others by applying other frameworks to the same dataset to observe the outcome. A four week period was also designated for a contingency plan just in case more time may be required for data collection and evaluation. Finally, sufficient time was given to write and produce a high quality dissertation.!26

6. Conclusion The aim of the project is to determine key influencers in online social media. The motivation for undertaking this project is due to the rise in popularity of viral marketing, as a marketing strategy for marketers to promote their brand and products. In order to do this, they have to accurately identify online influencers. Twitter was chosen as the research platform because Twitter is more of a broadcast medium compared to other social media platforms, which makes it the perfect platform for marketers to launch marketing campaigns. Related work suggested that activity generation, eloquence, novelty, and recognition forms the basic criteria for influencers. Over the past few years, a few other criteria have also been added to the list. Furthermore, researchers have always been using the number of followers, retweets and mentions to measure influence. But, influencers are found to be far more complicated and could not be accurately identified using quantitative measures alone. Therefore, we proposed a framework which looked at influencers in both the traditional and online social network context. Furthermore, qualitative measures such as the quality of contents posted by the user were taken into account. It is important to note that all the criteria should be used jointly to accurately measure influence. Overall, the project is progressing as planned despite some minor problems faced during data collection. In addition, sufficient time was assigned for each project phase to ensure that the project could be completed on time.!27

References Agarwal, N., et al. (2008). Identifying the influential bloggers in a community. Proceedings of the 2008 international conference on web search and data mining, ACM. Akritidis, L., et al. (2009). Identifying influential bloggers: Time does matter. Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on, IET. Alex Leavitt, E. B., David Fisher, Sam Gilbert (2009). The Influentials: New Approaches for Analyzing Influence on Twitter. Web Ecology Project. Bennett, S. (2012) Twitter Users Often Retweet Without Reading Or Clicking Links, Study Reveals. Berry, E. K. a. J. (2003). "The Influentials." Concentrated Knowledge for the Busy Executives 25(5). Cha, M., et al. (2010). "Measuring User Influence in Twitter: The Million Follower Fallacy." ICWSM 10(10-17): 30. Domingos, P. (2005). "Mining social networks for viral marketing." IEEE Intelligent Systems 20(1): 80-82. Feng, P. E. B. J. (2011). "Measuring user influence on twitter using modified k-shell decomposition." Hanneman, R. A. and M. Riddle (2005). Introduction to social network methods, University of California Riverside. Kemp, S. (2015) Digital, Social & Mobile Worldwide in 2015.!28

Kwak, H., et al. (2010). What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, ACM. McPherson, M., et al. (2001). "Birds of a feather: Homophily in social networks." Annual review of sociology: 415-444. Mowday, R. T. (1978). "The exercise of upward influence in organizations." Administrative Science Quarterly: 137-156. o'reilly, T. (2009). What is web 2.0, " O'Reilly Media, Inc.". Page, L., et al. (1999). "The PageRank citation ranking: Bringing order to the web." Provost, L. (2012) Achat de followers sur Twitter: nous avons fait le test et acheté 50.000 abonnés. Quercia, D., et al. (2011). In the mood for being influential on twitter. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, IEEE. Romero, D. M., et al. (2011). Influence and passivity in social media. Machine learning and knowledge discovery in databases, Springer: 18-33. Tedeschi, J. T., et al. (1972). "The exercise of power and influence: The source of influence." The social influence processes: 287-345. Watts, D. J. and P. S. Dodds (2007). "Influentials, networks, and public opinion formation." Journal of consumer research 34(4): 441-458. Weng, J., et al. (2010). Twitterrank: finding topic-sensitive influential twitterers. Proceedings of the third ACM international conference on Web search and data mining, ACM.!29

Westbrook, R. A. (1987). "Product/consumption-based affective responses and postpurchase processes." Journal of marketing research: 258-270. Wilson, R. F. (2000). "The six simple principles of viral marketing." Web Marketing Today 70(1): 232. Wu, S., et al. (2011). Who says what to whom on twitter. Proceedings of the 20th international conference on World wide web, ACM. Ya-ting, L. and C. Jing-min (2011). The social network analysis of political blogs in people: Based on centrality. Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on, IEEE. Zhou, H., et al. (2009). Finding leaders from opinion networks. Intelligence and Security Informatics, 2009. ISI'09. IEEE International Conference on, IEEE.!30