1 SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social networking sites like Facebook, Twitter, Google+, LinkedIn, YouTube, etc., blogs and discussion forums send out loud messages through users, who voice their opinions openly. The data captured from these sites is usually unstructured and huge in volume, and analyzing such massive content manually is a tedious task. This is where SPAN achieves a collateral edge through its accurately developed solution, where data from all the above sources on the web, treading back multiple years, can be collected and processed to derive concise results. Our real-time search technology enables us to extract a complete body of expressions through these sources from many users, simultaneously, on any given subject. This white paper describes the possible techniques in which sentiments of users on multiple social forums can be used and analyzed to gain a meaningful and actionable insight.
2 SPAN Prediction Engine SPAN s Prediction Engine on big data retrieves posts, comments or tweets about a company or a product to obtain predictive insights on consumer thought process. To analyze data and quantify the moods of individuals of social forums, the tool uses an algorithm developed specifically to analyze sentiments from social media conversation, on a massive scale. Our Prediction Engine examines the data from social sources, scores the post, comments, tweets, etc., on the sentiment scale and classifies the text by sentiment, as positive, negative, campaign, reply or query. Instances: Social Media Sources The following post has a negated sentiment as text, classified as negative: What a horrible company with a horrible customer service and horrible attitudes. On the other hand, this tweet is classified as positive: Nina was incredibly helpful and definitely made me a lot happier with your service. A post can be classified as a campaign if it is posted as an advertisement by the company. Try out our new 4G data plan for this month SPAN s Predictive Engine Positive Negative Reply Campaign Query In response to a question on customer satisfaction, a typical reply would be: I found the service to be good, and prompt. A consumer may want to inquire about a service center, with a query like: Where is the service center nearest to my location? The same applies to other sentiment categories such as reply and query. When each post expresses an adjective that belongs to one of the above categories, it becomes possible to compute a statistical model that can capture and quantify how people feel about something, as expressed in these social forums. A variety of sentiment analysis methods exist for analyzing all types of content from general news sources and other public data sources. SPAN Prediction Engine results in relatively high accuracy rate a 78 percent agreement rate with manually reviewed content. Statistically put, typically, even humans have about 80 percent agreement rate with each other. Our tool processes sentiments for every single post on social forums, allowing the application to separate the mood around a particular product from the changes in the overall mood of the moment. If the sentiment for Product X is low on a Monday morning, is it because people are unhappy with the product, or because the sentiment for all terms is more negative on that Monday morning? You can analyze the general mood patterns of individuals to determine the true sentiment for any specific term using the SPAN Prediction Engine. Sentiment Analysis on Big Data 2
3 Machine Learning Machine learning deals with construction and study of intelligent systems that are developed to identify changes in the data in hand and improve the algorithmic order to accommodate new findings. For example, a machine learning system could be made to adopt changes constantly, (based on buyer opinion) to rate health, life or automotive insurance policies with respect to coverage, duration, premium, benefits, popularity, etc. For an insurance service provider, this provides a high degree of success in selling its products. Ratings based on buyer sentiment can appropriately be used to recommend a policy that meets the expectations of a buyer. When we gather large volumes of direct or indirect opinions, views, interests and perspectives, we need to apply learning algorithms to generalize or establish new points of interest. Machine learning poses many scientific and engineering challenges. Statistics of the data collected and observed shifts rapidly in real-time and so do the feature of interests and views. Hence, the machine learning algorithms need to be continuously adaptive. For increased reliability, the statistical models need to be applied across multiple algorithms to obtain consolidated results. The machine learning algorithms used to perform sentiment analysis described in this paper are supervised learning algorithms. As the learning engine progresses with continuous arrival of inputs (training data), the prediction accuracy of the engine increases. The learning engine is generic in nature and can be used for a variety of applications and across multiple domains. Sentiment Analysis For analysis purposes, SPAN Prediction Engine was applied over extended time periods across all the social media data, isolating only those conversations referencing a telecom service provider company. This enabled us to comprehend how people actually felt, when the company released a product or raised its tariff for existing customers. We compared SPAN Prediction Engine s output and stabilized these posts on different social media, and also quantified the volume of keywords related to the company or its products. As depicted in the image above the amount of negative sentiments expressed on social forums on a daily basis was more than the positive sentiments realized for that month. The graph represents the trend of negative comments posted in a particular month when a service by the telecom company was released. Sentiment Analysis on Big Data 3
4 Basic Building Blocks in Sentiment Analysis Training Sets Fetched from HDFS; posts & tweets labeled manually based on the nature of sentiments Data Pre-Processing Training sets & input sources with NLP Sentiment Analysis Model built as per training set; predicts sentiments for posts & comments from input source Sentiment Scores The prediction output from the previous step is shown in reports. Input Data Source (Big Data) Data source in HDFS; posts & tweets are fetched for product/ company Lexicons and Linguistic resources Libraries to carry out NLP Implementation Model Unstructured Data Using Hadoop User Web Portal Learning Engine Reporting Engine Report Product Services Structured Data Visualization Statistical Model A statistical model was built by giving thousands of training sets, which were tagged manually with precision. This model was further applied to the next set of social feeds from different social sources about the telecom company, which enabled us determine the sentiments with an accuracy rate of 78 percent. Percentages above 60 are acceptable in predictive analytics since most of the sentiment analytic models tag sentiments in three categories - positive, negative and neutral. We have categorized neutral sentiments into reply, queries and campaign sections. Sentiment Analysis on Big Data 4
5 As depicted in the image above, the amount of negative and queries are expressed on these social forums on daily basis were found to be correlated. The graphical representation depicts the correlation between negative comments and queries, while the reply section is on the lower end. This portrays the increased percentage of queries asked that spikes up the negative graph. The graph was validated when the company s social media page was checked for user responses. This depiction provides a number of insights for a company to determine the ideal time to post a campaign about its new introductions to obtain more of positives than the negatives or the neutrals. The image above shows the time of the day when most customers are active, which is mostly late nights. There is a spike at 8 PM that is rising high till midnight, which indicates that a company should post a campaign or an ad about their new product between 8 PM and midnight. Subsequent to considering the ideal time to post your campaign or an ad, you would also know the top influencers and most used words by people in their conversations, to understand what the users of different age groups expect from a product / service. The image shows top influencers and words used in such conversations. Sentiment Analysis on Big Data 5
6 Conclusion With millions of conversations occurring on the social media each day, the science of extracting relevant data and using statistics to quantify how people are expressing themselves has become a rapidly evolving discipline. There are significant advantages to identifying correlations in social sentiments and product marketing when you are able to apply search techniques to social data, extracting only those conversations related to your company or product. When sentiment analysis is applied to such focused set of conversations over longer durations, it gives precise outcomes to open up prospective avenues for a company to enhance the value of its product / service portfolio. SPAN s analytical solution provides additional results as they become available, and allows for deeper R&D, thereby improving an organization s overall capabilities. For more information on our entire range of solutions and related offerings, get in touch with: About SPAN: SPAN is an established software services company offering comprehensive IT services since Our clients include Fortune 1000 companies, software firms (ISVs) and tech start-ups.span s offshore development centers in India are certified for ISO 9001:2008 & ISO 27001:2005 and appraised at CMMI Maturity Level 5 and PCMM Maturity Level 5. SPAN has a global footprint with offices in the U.S., Singapore, India, and group offices in Europe. There are multiple offshore development centers in Bangalore and Chandigarh, India. SPAN is ranked as #7 Best IT Employers in India by a leading IT publication. SPAN s Relationship Management (RM) Model is a well-defined, yet flexible framework, which provides ongoing business wholly owned by the largest Nordic IT services major, EVRY (www.evry.com). USA Headquarters SPAN Systems Corporation 1425 Greenway Drive, Suite 490 Irving, Texas Phone: / SPAN-SYS Fax: India Headquarters SPAN Infotech (India) Pvt. Ltd. 18/2, Vani Vilas Road, Basavanagudi Bangalore , India Phone: Fax: Copyright 2015 by SPAN. All rights reserved. The contents of this document are protected by copyright law and international treaties. SPAN acknowledges the proprietary rights of the trademarks and product names of other companies mentioned in this document. The reproduction or distribution of the document or any portion of it thereof, in any form or by any means without the prior written permission of SPAN is prohibited.
SPAN White Paper SPAN Solution Engineering Approach Introduction The days of being tied to one s desk for long hours to access information and do productive work are gradually decreasing for many as smartphones
WHITEPAPER Big data in banking for marketers How to derive value from big data B NK 2020 PAGE 1 INNOVATION LAB INNOVATION LAB FOREWORD In Marketing & Sales the main strategic goals are to acquire new customers,
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
Social Media Marketing benefits for businesses Why and how should every business create and develop its Social Media Sites? This 2012 Master Thesis report will highlight the main business benefits of Social
Text Analytics: The Victory Index Report SAS VICTORY Index d o u b l e v i c t o r Fern Halper, Ph.D Partner and Principal Analyst Marcia Kaufman COO and Principal Analyst Daniel Kirsh Senior Analyst Table
WHITE PAPER The Math of Modern Marketing: How Predictive Analytics Makes Marketing More Effective Sponsored by: SAP Gerry Murray May 2014 Mary Wardley IDC OPINION Today's highly competitive marketplace
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
Social Media in Recruiting Using New Channels To Source Talent Benchmark Research White Paper Aligning Business and IT To Improve Performance Ventana Research 2603 Camino Ramon, Suite 200 San Ramon, CA
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
Five predictive imperatives for maximizing customer value Applying predictive analytics to enhance customer relationship management Contents: 1 Introduction 4 The five predictive imperatives 13 Products
IBM Customer Experience Suite and Predictive Analytics Introduction to the IBM Customer Experience Suite In order to help customers meet their exceptional web experience goals in the most efficient and
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
Hurwitz ViCtOrY index Advanced Analytics: The Hurwitz Victory Index Report SAP Hurwitz Index d o u b l e v i c t o r Marcia Kaufman COO and Principal Analyst Daniel Kirsch Senior Analyst Table of Contents
Follow the Green: Growth and Dynamics in Twitter Follower Markets Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, Ben Y. Zhao Department of Computer Science,
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
TDWI research First Quarter 2014 BEST PRACTICES REPORT Predictive Analytics for Business Advantage By Fern Halper Co-sponsored by: tdwi.org TDWI research BEST PRACTICES REPORT First Quarter 2014 Predictive
DRAFT VERSION Big Data privacy principles under pressure September 2013 2 Contents Summary... 6 1 Introduction... 8 1.1 Problems for discussion... 8 1.2 Definitions... 9 1.2.1 Big Data... 9 1.2.2 Personal
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
Association for Data-driven Marketing & Advertising BEST PRACTICE GUIDELINE: BIG DATA A guide to maximising customer engagement opportunities through the development of responsible Big Data strategies.
February 2009 Seeding the Clouds: Key Infrastructure Elements for Cloud Computing Page 2 Table of Contents Executive summary... 3 Introduction... 4 Business value of cloud computing... 4 Evolution of cloud
Predictive Analytics The Right Tool for Tough Times February 2010 David White Page 2 Executive Summary Enterprises are under pressure to predict the future behavior of customers and potential customers,
White Paper 5 Ways to Get Recruiting Results HireGround Since 1999, HireGround has established itself as a leading provider of Applicant Tracking Software (ATS). HireGround is recognized for its strong
03 Market Research What s inside: We begin with an introduction, and then it s into the key terms and concepts of market research, quantitative and qualitative research, how to go about gathering data,
Introduction.... 1 Emerging Trends and Technologies... 3 The Changing Landscape... 4 The Impact of New Technologies... 8 Cloud... 9 Mobile... 10 Social Media... 13 Big Data... 16 Technology Challenges...