Reputation Management System



Similar documents
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Using Tweets to Predict the Stock Market

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Forecasting stock markets with Twitter

DIY Social Sentiment Analysis in 3 Steps

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Sentiment analysis on tweets in a financial domain

Twitter sentiment vs. Stock price!

Applying Data Mining Techniques to Social Media Data for Analyzing the Student s Learning Experience

Sentiment Analysis on Big Data

Big Data and Opinion Mining: Challenges and Opportunities

Can Twitter provide enough information for predicting the stock market?

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

Information Retrieval Elasticsearch

The Italian Hate Map:

Semantic Search in Portals using Ontologies

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

Abstract. Description

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP

Social Big Data Analysis on Perception Level of Electromagnetic Field

Extracting Information from Social Networks

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Applying Machine Learning to Stock Market Trading Bryce Taylor

Social Media Data Mining and Inference system based on Sentiment Analysis

Search and Information Retrieval

SOCIAL LISTENING AND KPI MEASUREMENT Key Tips for Brands to Drive Their Social Media Performance

Active Learning SVM for Blogs recommendation

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Creating Usable Customer Intelligence from Social Media Data:

Network-based spam filter on Twitter

Online Reputation Management Services

Automatic measurement of Social Media Use

Sentiment Analysis and Time Series with Twitter Introduction

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES

Decision Making Using Sentiment Analysis from Twitter

Neuro-Fuzzy Classification Techniques for Sentiment Analysis using Intelligent Agents on Twitter Data

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Lexical and Machine Learning approaches toward Online Reputation Management

Big Data and Open Data

Social Media Marketing for Small Business Demystified

BEST PRACTICES FOR SOCIAL MEDIA IN CHURCHES MULTIMEDIA SIZING COVER PHOTOS

The Seven Practice Areas of Text Analytics

Project 5 Twitter Analyzer Due: Fri :59:59 pm

SENTIMENT ANALYZER. Manual. Tel & Fax: info@altiliagroup.com Web:

Chorus Tweetcatcher Desktop

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

ITP 342 Mobile App Development. APIs

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science

End-to-End Sentiment Analysis of Twitter Data

SOCIAL MEDIA ANALYTICS AND TOOLS 101

Social Market Analytics, Inc.

A neo4j powered social networking and Question & Answer application to enhance scientific communication. René Pickhardt, Heinrich Hartmann

Social Recruiting How to Effectively Use Social Networks to Recruit Talent

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it

the beginner s guide to SOCIAL MEDIA METRICS

Digital Asset Management (DAM) Protecting, preserving, retrieving and distributing digital assets

6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Public Opinion on OER and MOOC: A Sentiment Analysis of Twitter Data

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Simba Apache Cassandra ODBC Driver

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

Take Advantage of Social Media. Monitoring.

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

SCALABLE DATA SERVICES

Content vs. Context for Sentiment Analysis: a Comparative Analysis over Microblogs

Twitter for Beginners

Northumberland Knowledge

Marketing Communications Essentials: B2B Marketing for Small Businesses. June 18, 2014

Transcription:

Reputation Management System Mihai Damaschin Matthijs Dorst Maria Gerontini Cihat Imamoglu Caroline Queva May, 2012

A brief introduction to TEX and L A TEX Abstract

Chapter 1 Introduction Word-of-mouth marketing has always been an important success factor for consumer oriented businesses brand reputation is an important part of the value of a company. Today, with social media, the reputation of a brand, product or service can change much more rapidly than before, and the range of consumer sentiment and attitude is much larger than before. Although reputation of brands, products and services previously had been difficult to track, today, it can be tracked by what is written about them online, e.g., in micro-blogs such as Twitter. Brand reputation mining by monitoring social media sources for language that may affect reputation in a positive or negative way can be a powerful tool for an organization s public relations and marketing departments. To tackle with the aforementioned issues, we developed an online reputation management system. The system basically takes a keyword (name of a company, for instance), and categorizes people s opinions according to Twitter tweets as positive, negative and neutral. Moreover, it measures how strong people s attitude and takes into account how effective and authoritative the people are in the Twitter. Although our system utilizes techniques in machine learning, information retrieval and natural language processing, the results are conveyed to the reader in a very simple and understandable manner. 1

Chapter 2 Methods 2.1 Search 2.1.1 @ TechReportID, author = author, title = title, institution = institution, year = year, OPTkey = key, OPTtype = type, OPTnumber = number, OPTaddress = address, OPTmonth = month, OPTnote = note, OPTannote = annote, History 2.2 Get tweets To be able to do any kind of analysis we needed to retrieve data about the tweets. Besides the actual tweet text we also needed information about the users that posted the tweets and properties such as the number of retweets, followers and favorites. Twitter offers access to most of its functionalities through a REST API. Rather than writing our own JAVA/REST connector we used the twitter4j library. Although not an official library it provided us direct access to the data through JAVA function calls. On top of this library we also added another layer specific for our use-case. One problem that we were faced with was the rate limit imposed by the API. Our program can only do 150 calls per hour, when not authenticated and 350 otherwise. In the case of the tweet data we solved this issue by employing a common strategy in such situations - caching. We extended classes in the twitter4j library thus making our methods work regardless of where the actual data came from. On the other hand, in the case of the user data this limitation meant we couldn t do a PageRank implementation on a user graph. 2.3 Sentiment Analysis 2.3.1 Tokenization Before analyzing the tweets some preprocessing had to be done. The first process is the tokenization of tweets. At the beginning of the project, a simple tokenization that split the text using whitespaces as delimiters was used. In this process, words and punctuation can constitute a token. This tokenization is restricted and needed improvements to handle punctuation. For the final tokenizer the StandardTokenizer class of Lucene is used (Apache, http://lucene.apache.org/ ). This tokenizer splits words at punctuation characters -removing punctuation- and at hyphens; unless there is a number in the token, in which case the whole token is interpreted as a product number and is not split. With this tokenizer it is also possible to recognize email addresses and internet hostnames as one token. 2

2.3.2 Lexicon Once tweets about the company we are searching have been retrieved, they must be analyzed to know if they are positive or not. Three labels are used for the tweets: negative, neutral and positive. A neutral tweet is defined as a sentiment score that equals to 0. For this analysis, a sentiment lexicon is used. The lexicon is made of sentiment words associated with a score, for example (good: +1), (bad: -1), (like: +1), (hate: -2) Using the lexicon is quite easy; one just needs to sum up the score of each token to have the overall sentiment score of a tweet. However, this method can be improved using some heuristics. The first thing to take into account is the negations problem. To deal with this problem, we added negations in the database and when a negation is found in a sentence, scores of the two following words are reversed; multiplying them by -1. For example This is not good will give a score (-1)*(+1)=-1. In the same way, some modifiers were added to our analysis. Modifiers are words like really, very or quite. All these words modify the sentiment score of the following word multiplying it by a factor proportional to the strength of the modifier, for example very will multiply by 2 whereas quite will multiply by 0.5. Then it is also important to deal with sentences like I love companya but I hate companyb. For this, the notion of distance was added to the analysis. The distance will increase the score of sentiment words close to the name of the company and reduce the score of sentiment words far away. A Gaussian distribution is used with a maximum distance of 4; this means that sentiment words that are far from the name of the company (4 words between them) are not taken into account. 2.4 PageRank 2.5 User Interface 3

Chapter 3 Results 3.1 Results 3.1.1 @ TechReportID, author = author, title = title, institution = institution, year = year, OPTkey = key, OPTtype = type, OPTnumber = number, OPTaddress = address, OPTmonth = month, OPTnote = note, OPTannote = annote, History 4

Chapter 4 Related Work 4.1 Related Work 4.1.1 @ TechReportID, author = author, title = title, institution = institution, year = year, OPTkey = key, OPTtype = type, OPTnumber = number, OPTaddress = address, OPTmonth = month, OPTnote = note, OPTannote = annote, History 5

Chapter 5 Conclusions 5.1 Conclusion 5.1.1 @ TechReportID, author = author, title = title, institution = institution, year = year, OPTkey = key, OPTtype = type, OPTnumber = number, OPTaddress = address, OPTmonth = month, OPTnote = note, OPTannote = annote, History 6

Bibliography 7