Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data
Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights reserved.
Text Analytics Successful companies today both listen to and understand what customers are saying and are taking action in response to customer feedback by incorporating the voice of the customer (VOC) into business strategies for sales, marketing and customer service using text analytics. Powerful trends in social media, e-discovery, customer services (call center transcriptions of voice calls, customer complaint emails and instant messaging) and customer-centric business strategies are driving IT leaders to consider text analytics as a powerful business tool. The transformed information from text analytics can be combined with structured data (e.g., sales and demographic data) and analyzed using various business intelligence or predictive and automated discovery techniques. 3
Text Analytics What is Text Analytics? Text analytics describes a set of linguistic, statistical and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research or investigation. Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can be leveraged in various ways. 4
Text Analytics Structured Data Today, 80% of business information originates in unstructured data; primarily text with no identifiable structure....although structured data continues to be the primary source for business intelligence. 5
Text Analytics Unstructured Data INTERNAL EXTERNAL Emails Customer Surveys Documents Call Center Notes Claims Records Customer Forms Customer Letters Blogs Social Media Tweets Online Forums Articles / Reports Web 6
Use Cases Text Analytics transforms unstructured data into structured data for analysis to help... Monitor and analyze brand reputation Determine purchase behavior Identify product issues Summarize surveys, customer reviews Improve customer service and customer experience management Understand customer feedback Improve customer retention Predict and reduce churn Identify and reduce claims fraud Develop cross-sell, upsell strategies Design next best offer strategies 7
Use Cases Marketing Business Industry-Specific Voice of customer Social media analysis Churn analysis Market research Survey analysis Competitive intelligence Document categorization Human resources Records retention Risk analysis Website navigation News feeds analysis Fraud detection E-discovery Warranty analysis Medical research 8
Terms Or given a collection of text, text analytics tells you who, where, when, what, and how so that you can figure out why. 1. Entity: Who, where, when is being discussed? 2. Theme: What are the important words? 3. Classification: What are the important concepts? 4. Sentiment : How is the conversation going? Is it positive or negative? 9
Terms Entity Who, where, when is being discussed? Yahoo wants to make its Web e-mail service a place you never want to or more importantly have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client. Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive. Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo. Entity Yahoo Twitter Facebook U.S. Type Company Company Company Place 10
Terms Theme What are the important words being used? Yahoo wants to make its Web e-mail service a place you never want to or more importantly have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client. Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive. Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo. Theme Cloud computing technology Score 4.11 E-mail service 2.672 Top users requests 2.669 11
Terms Classification/Concepts What are the important, high-level concepts? Yahoo wants to make its Web e-mail service a place you never want to or more importantly have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client. Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive. Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo. Concept Software and Internet Score.56 Social Media.60 Technology.49 Business.72 12
Terms Sentiment How is the conversation going? Positive or negative? Yahoo wants to make its Web e-mail service a place you never want to or more importantly have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client. Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive. Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo. Entity Sentiment Yahoo.534 Twitter.48 Facebook.534 Concept Sentiment Software and Internet 0.0 Social Media.48 Technology.49 Theme Cloud computing technology Sentiment 1.3 Mail service.16 Top user requests.83 13
Trends 1. Social media analytics adoption drives text analytics. 2. Analytics moves beyond sentiment analysis. 3. The market begins to get the connection between text and Big Data. 4. Marrying structured and unstructured data becomes more popular. 5. The cloud becomes more popular for text analytics. Text Analytics Victory Index Report, January, 2013 14
Scenario Book Reviews: Customer Feedback An online book retailer tracks customer feedback by analyzing reviews and comments from online forums and social media. They use Angoss KnowledgeREADER to extract meaning from the text to discover what is being discussed and how the sentiment (positive or negative), and answer: What are customers saying on a regional basis? How frequently do certain entities, themes and topics occur? Which themes and topics occur together, and are related? How is sentiment trending over time? What is the context of what is being discussed at the document level? 15
Scenario Sentiment Dashboard Sentiment breakdown across all reviews Sentiment distribution across all documents Sentiment distribution for Top 10 topics, themes and entities 16
Scenario Comparison Analysis The retailer can compare overall sentiment across stores, or isolate individual topics, themes, entities and phrases to determine how those items are discussed between various regions For example, you can see that the topic Technology is viewed more negatively in Store 2, but it is also discussed more frequently as well. 17
Scenario Trend Analysis By isolating topics, themes, entities or phrases, the retailer can examine how frequently they were mentioned. They can also view how customer sentiment regarding these terms changed alongside the frequency of their occurrence. 18
Scenario Association Discovery Using the Association Map, the retailer can visually determine the frequency with which certain terms occur, and how closely they relate to other terms used in customer reviews. The retailer can quickly assess how well certain subjects are received, and how much relative interest their customers have in those subjects. 19
Scenario Document Summary Individual terms can be isolated, as well as the sentences and documents that reference them giving you a detailed look at the context used in reviews. Each text record can be completely isolated for a full examination of the content and sentiment contained within. 20
Scenario Decision Tree Price High 26,820 14.21% Low 161,985 85.79% Total 188,805 100.00% rank_1_topic null Automotive Hotels Video Games Weather High 8,662 10.94% Low 70,489 89.06% Total 79,151 41.92% Advertising Aviation Education Investing Law Religion High 5,906 17.57% Low 27,711 82.43% Total 33,617 17.81% Agriculture Art Biotechnology Crime Disasters Food Politics Space Sports High 3,037 13.43% Low 19,583 86.57% Banking Beverages Marriage Real Estate Renewable Energy Robotics Travel High 1,136 9.16% Low 11,270 90.84% Total 12,406 6.57% Business Economics Mobile Devices High 1,862 19.87% Low 7,509 80.13% Total 9,371 4.96% Elections Fashion Intellectual Property Labor Popular Culture High 889 15.18% Low 4,968 84.82% Total 5,857 3.10% Environment Social Media High 551 23.93% Low 1,752 76.07% Total 2,303 1.22% Hardware High 92 30.77% Low 207 69.23% Total 299 0.16% Health Traditional Energy High 853 11.83% Low 6,356 88.17% Total 7,209 3.82% Science Technology War High 3,064 21.83% Low 10,971 78.17% Total 14,035 7.43% Software and Internet High 768 39.65% Low 1,169 60.35% Total 1,937 1.03% Total 22,620 11.98% KnowledgeREADER can be used to analyze the output of your text analysis with structured data, and use data mining and predictive analytics techniques to expand customer insights. In this example, the retailer has created a Decision Tree that allows them to determine the price breakdown across book genres. The Decision Tree uses High and Low price brackets to segment genres. The retailer can now determine if there is a correlation between price, genre and overall sentiment. They may use these insights to inform product inventory or pricing decisions. 21
Scenario Strategy Tree Price High 5,208 10.67% null Low 43,581 89.33% Beverages Total 48,789 25.84% Hotels Avg Rating 5.00 Real Estate Avg Sale Price $13.45 Video Games Avg Sentiment 0.34 Weather Most Common Phrase wonderful Treatment E-Mail BOGO Price KnowledgeREADER can be used to build and deploy predictive strategies with Strategy Trees. High 1,211 21.10% Total 188,805 100.00% Avg Rating 4.20 Avg Sale Price $15.05 Avg Sentiment 0.22 Most Common Phrase great word_count Total 17 0.01% Avg Rating 4.65 Avg Sale Price $13.24 null null Avg Sentiment Most Common Phrase null Treatment Ignore Total 188,788 99.99% Avg Rating 4.20 Avg Sale Price $15.05 [1,5644] rating Avg Sentiment 0.22 Most Common Phrase great Total 75,890 40.19% Avg Rating 3.01 Avg Sale Price $14.71 [1,4] Avg Sentiment 0.12 Most Common Phrase great Treatment Ignore Price High 16,414 14.54% Low 96,484 85.46% Total 112,898 59.80% 5 rank_1_topic Avg Rating 5.00 Avg Sale Price $15.27 Avg Sentiment 0.30 Most Common Phrase wonderful Low 4,529 78.90% Advertising Total 5,740 3.04% Aviation Avg Rating 5.00 Business Avg Sale Price $18.65 Economics Avg Sentiment 0.30 Most Common Phrase great Treatment E-Mail New Hot Reads Price Agriculture High 1,755 13.67% Art Low 11,079 86.33% Crime Total 12,834 6.80% Disasters Avg Rating 5.00 Health Avg Sale Price $15.32 Space Avg Sentiment 0.24 Sports Most Common Phrase wonderful Traditional Energy Treatment E-Mail Buy 3 Get 4th Free Price High 618 9.61% Automotive Low 5,813 90.39% Banking Total 6,431 3.41% Marriage Avg Rating 5.00 Renewable Energy Avg Sale Price $12.40 Robotics Avg Sentiment 0.26 Travel Most Common Phrase wonderful Treatment E-Mail BOGO Price High 1,819 22.68% Low 6,200 77.32% Biotechnology Elections Total 8,019 4.25% Science Avg Rating 5.00 Technology Avg Sale Price $17.98 War Avg Sentiment 0.23 Most Common Phrase wonderful Treatment E-Mail 25% Off Coupon Price High 3,725 18.14% Education Low 16,815 81.86% Intellectual Property Total 20,540 10.88% Labor Avg Rating 5.00 Law Avg Sale Price $17.05 Religion Avg Sentiment 0.27 Most Common Phrase wonderful Treatment E-Mail New Hot Reads Price High 369 25.31% Here, the retailer has identified segments based on price and genre. In addition, they can track key metrics that drive store performance. Combined with the text analysis output, this measures the average sentiment, rating, sale price and the most common themes discussed in each segment. By associating a treatment with each segment, the retailer can automatically assign specific actions or activities to each segment. Low 1,089 74.69% Total 1,458 0.77% Environment Avg Rating 5.00 Hardware Avg Sale Price $20.97 Avg Sentiment 0.24 Most Common Phrase wonderful Treatment E-Mail 25% Off Coupon Price High 1,295 15.99% Fashion Low 6,802 84.01% Food Total 8,097 4.29% Investing Avg Rating 5.00 Mobile Devices Avg Sale Price $16.66 Politics Avg Sentiment 0.29 Popular Culture Most Common Phrase wonderful Now, the book retailer can quickly turn insight into action. Treatment E-Mail New Hot Reads Price High 414 41.82% Low 576 58.18% Social Media Software and Internet Total 990 0.52% Avg Rating 5.00 Avg Sale Price $25.10 Avg Sentiment 0.34 Most Common Phrase great Treatment E-Mail 25% Off Coupon 22
Scenario Angoss KnowledgeREADER KnowledgeREADER is an industry-first software application that brings a new age of integrated customer intelligence by combining visual text discovery and sentiment analysis with the power of predictive analytics. Now, customer intelligence professionals and marketers can easily understand and model customer feedback without relying on data analysts. KnowledgeREADER delivers unparalled customer intelligence and voice of the customer insights to support customer experience management above and beyond what text analytics users have come to expect. Learn more about KnowledgeREADER 23
Resources Video Quick Tour of KnowledgeREADER Articles Voice of the Customer, How to Move Beyond Listening to Action Text Analytics Categorization and Concept Topics Text Analytics Phrase and Theme Extraction Text Analytics Sentiment Extraction Text Analytics Named Entity Extraction Brochure KnowledgeREADER Web KnowledgeREADER 24
About Angoss Angoss Software Corporation is a global leader in delivering business intelligence software and predictive analytics to businesses looking to improve performance across sales, marketing and risk. With a suite of desktop, client-server and big data software products and Cloud solutions, Angoss delivers powerful approaches to turn information into actionable business decisions and competitive advantage. Angoss software products and solutions are user-friendly and agile, making predictive analytics accessible and easy to use. For more information visit www.angoss.com. 25 2013 Angoss Software Corporation. All rights reserved.