1 Alexander Lang Business Analytics, IBM IBM Social Media Analytics Text Analysis on Big Data
2 Agenda Social Media Analytics: Scope and Myths What to measure in Social? Analysis approaches and Challenges Our Text Analysis Environment 2
3 Three areas of Social Media Analytics Content Tweets, forum posts, blogs, video comments,... Files shared in Collaboration Suites People Geograpic, Demographic, Behavioral Profiles Expertise and Influence Today s focus Relationships How does content spread? How do people interact?
4 Some myths about Social Media Analytics It s all about twitter and facebook Buying decisions are often made based on reviews, blog entries, forum discussions It s all about detecting the next outrage Detecting consumer sentiment is just one of many insights Social media is a good way to learn what people like about products / services Predict elections, sales demand,...by looking at social media alone Social media can be a valuable addition, but never a replacement for planning, surveys,... Social analysis results need to be integrated with internal data for more relevance You need to analyze petabytes in nanoseconds for relevant results It s the analysis depth that counts some analyses yield only 100K data points The right time to deliver analysis results is when the customer has enough data to make a decision not before
5 Social Media Analytics requires more than Analytics Consumability Meaningful Social Media Metrics Driven by the Line of Business, not IT Deliver results to more than one employee Capability Influencer Identification Network / Graph Analysis Statistics - Affinity patterns Text Analysis / NLP - Brands, Products, Product Features - Sentiment, Mood, Emotion - Author Location, Demographics, Behavior, Personality Traits - Topic Clusters Data Integration -Author profile matching -Correlation with internal KPIs 5
6 Agenda Social Media Analytics: Scope and Myths What to measure in Social? It depends on who you ask... Analysis approaches and Challenges Our Text Analysis Environment 6
7 Marketing: I want to enhance my reach in social media Competitor 1 Competitor 2 MyCompany I lag behind my peers Competitor 1 Competitor 2 MyCompany What s tripadvisor doing here and why am I missing?
8 Marketing: I want to find new sales channels for my retail brand 8 Insight Action Proximity to a Healthy Food store is seen as a plus Start co-marketing activities with certain hotels to pull healthy food shoppers to my store locations
9 Public Relations: I want to protect my brand reputation Competitor 1 Action Check own supply chain to pro-actively avoid this problem Prepare statement to clarify that your brand is not affected
10 Sales: I want to avoid customer churn or identify sales leads Company One Thilo Company Two Identify engagement signals... Thilo CSR 2 Alternatives...before your competitor does Thilo Thilo Next step: Engage Social is only one view of the customer
11 Product Management: I want feedback on what consumers like/dislike around the competition 30,00% 25,00% 20,00% 15,00% 10,00% 5,00% 0,00% Discussions by Topic/Concern 24,20% 7,40% 1,9% 1,50% 1,2% 0,50% Smell Stains Hygiene Palm oil Allergies Cold wash Drill down 60,0% 50,0% 40,0% 30,0% 20,0% 10,0% 0,0% Seriously, Seriously, I I definitely definitely can t can t recommend [brand A] to anyone recommend [brand A] to anyone with with a a sensitive sensitive nose! nose! After After 2 2 wash loads, the whole house wash loads, the whole house smells urgh!! smells urgh!! Smell Related Discussions by Brand Competitive Brands A B C D E F G H I Have again bought [brand E]. I just can t Have again bought [brand E]. I just can t take take the the smell smell of of [brand [brand A] A] any any more more and and fortunately the bottle is almost empty. fortunately the bottle is almost empty. Somehow, Somehow, my my clothes clothes start start smelling smelling kind kind of sour after a couple of days in the closet, of sour after a couple of days in the closet, when I ve used [brand A] when I ve used [brand A]
12 A Social Media Framework defining and grouping Social Media KPIs Social Media Impact Are we marking the right investments in products, services, markets, campaigns, partners? Measures: Share of Voice Reach Sentiment Social Media Discovery What new topics are emerging in social media? Measures Topics Participants Sites Social Media Segmentation Who are influencers and advocates? How can I identify sales leads from social media activity? Who are the customers I should reach out to? Measures: Geographics,Demographics Users, Prospective Users Advocates, Detractors Influence Social Media Relationships What is driving social media behaviour? Measures: Affinity Association Cause 12
13 How does it work? Inside IBM Social Media Analytics 1.2 SSO Welcome Page Multitenancy Administration Analysis Reporting Project Project Project Security Project Project Blogs Message boards Twitter News Facebook Analytics Platform Topic Detection Text Analytics IBM BigInsights 1.3 Search Index Influencer Identification DB2 ESE 9.7 Cognos Reviews Video 13 E2E Orchestration IBM Social Media Analytics
14 Agenda Social Media Analytics: Scope and Myths What to measure in Social? Analysis approaches and Challenges Our Text Analysis Environment 14
15 1 5 Text Analytics for Social Media Analysis Goal: extract information from what users write to Aggregate information into meaningful statistics for end users, as well as extract Supporting evidence for the statistics presented to users. Types of information include Sentiment: are users writing positively or negatively about the product Demographics: gender, age, family status, geographic information... Author behavior : are they recommending or cautioning against the product, are they owners or potential buyers of the product...
16 Rules vs. Machine Learning the advantages of Rules High expressiveness: phenomena like comparisons are straightforward to express in rules Smaller amount of human-coded training data: Smaller adaption effort to new domains and languages Clear lineage: If it doesn't work, we can understand why and can fix it quickly even after several iterations Transparency for our users More fine-grained: detect sentiment for a particular product, not a whole tweet Statistical approaches help us to build rules
17 The challenge in Social Media... OBI in the German DIY world OBI in the rest of the world The F50 Also the F50 17
18 Capturing concepts (such as brands or products) Simple keywords are not enough you sometimes need regular expressions to capture all variations Define concepts through include, context and exclude terms Include terms make up the concept (including synonyms) Context terms describe relevant contexts Exclude terms rule out irrelevant meanings Examples: Only match Obi when neither Wan, nor Kenobi, nor star wars are present (exclude) Only match F50 when sports or running or adidas are also present (context)
19 Detecting Sentiment in English, German, Spanish, French, Chinese (traditional + simplified), Dutch,... Goal & Key challenge: Only pick up sentiment that is relevant to a concept Yesterday I had a sports massage which was wonderful. So I went running with my new Running XYs but got blisters Aggregate the sentiment for each concept mention Positive: concept mention contains more positive than negative sentiment for the concept I've had a Phone A for a bit less than a month now and it's pretty sweet Negative: vice versa While I like my Phone A (despite it's many flaws) I am not feeling all that confident that I'll see Gingerbread on my device. Ambivalent: equal amount of positive and negative sentiment The battery on Phone A is good, but the charging time could be better Neutral: concept mention doesn t contain any sentiment around the concept On-device debug with Phone A USB driver. You need to install the device-specific driver in addition to the SDK 19
20 Configuring Sentiments for SMA Administrators Add or remove positive or negative sentiment terms & sentiment blockers De-activate sentiment terms Term is kept in the sentiment list, but is not applied in snippets Useful to keep terms around No configuration of grammar rules
21 Steps to detect sentiment a rule-based approach 1. Detect positive and negative terms love, sweet blisters, flaws 2. Remove terms that are covered by sentiment blockers issue vs. January issue 3. Apply syntax rules to determine negation, desires, questions... I m confident vs. I m not confident a problem vs. they solved the problem they improved their service vs. they should improve... They are good vs. Are they good? 4. Pick the sentiment phrases that are close to a concept Can be based on source (e.g., blogs vs. reviews), proximity, grammatical constraints... 21
22 Example: Concept-level sentiment around the Sony Xperia Z
23 Identifying author demographics Gender Identify gender through cues from the author s first name, the author s nickname and the author content Is author married (en, de, es, fr) Identified in author content through trigger terms and text analysis rules Is author a parent (en, de, es, fr) Identifed in author content through trigger terms and text analysis rules nicknames can be a good source of information as well ( SuperMom2012 ) 23
24 Identifying author behavior (en, de, es, fr) Users of a certain product or service What product features are relevant for them? Recommenders E.g., authors mentioning you should use X Detractors e.g. authors mentioning stay away from X Prospective users Potential sales leads for 1:1 engagement Identify sites where prospective users congregate 24
25 2 5 One road ahead: Deeper author-based insights
26 Topic Detection in Social Media Goal: find lists of keywords (=topics) that allow to reconstruct a social media post through a combination of topics Approach: Non-Negative Matrix Factorization Advantage over document clustering: focus is on getting representative topic keywords, which helps the user to understand what he documents are about, not perfect document clusters Factorization problem: term K topics term document document K topics Document-Term-Matrix X Document-Topic-Matrix W Topic-Term-Matrix H
27 Agenda Social Media Analytics: Scope and Myths What to measure in Social? Analysis approaches and Challenges Our Text Analysis Environment 27
28 Text Analytics Architecture an IBM view Named entity extractors, Sentiment, Log analysis, Table extraction, etc. Scalable Runtime Tooling + UI Customizable Libraries Extraction Language Compiler + Optimizer Execution Engine Core Operations Various levels of users Simplicity vs. expressiveness Tokenization, Lemmatization, POS tagging, Sentence boundary detection, Deep parsing, 28
29 Architecture Implementation: IBM System T Eclipse tooling: editor, workflow analysis, pattern discovery, regex learner, profiler, AQL Scalable Embeddable Runtime Tooling + UI Generic Libraries Extraction Language Compiler + Optimizer Execution Engine Core Operations NE libraries, Sentiment, informal-text normalizer, Cost based optimizer 20+ languages currently supported 29
30 AQL: A Declarative Language to Specify Extraction Patterns create view Number as extract regex /\d+(\.\d+)?/ on D.text as match from Document D; Asia-Pacific revenues increased 7 percent (5 percent, adjusting for currency) to $4.8 billion. OEM revenues were $1.0 billion, down 3 percent compared with the 2005 fourth quarter. Choice of SQL-like syntax for AQL motivated by wider adoption of SQL
31 AQL example: Dictionary Match create view Unit as extract dictionary UnitDict' with flags 'IgnoreCase' on D.text as match from Document D; Asia-Pacific revenues increased 7 percent (5 percent, adjusting for currency) to $4.8 billion. OEM revenues were $1.0 billion, down 3 percent compared with the 2005 fourth quarter.
32 AQL example: matching sequences create view AmountWithUnit as extract pattern <N.match> <U.match> as match from Number N, Unit U; Asia-Pacific revenues increased 7 percent (5 percent, adjusting for currency) to $4.8 billion. OEM revenues were $1.0 billion, down 3 percent compared with the 2005 fourth quarter.
33 AQL expressiveness Similar to the standard relational model used by SQL databases like DB2 All data in AQL is stored in tuples: data records of one or more columns/fields Basic extraction constructs EXTRACT statement Regular expression Dictionaries Sequence pattern Relational-style constructs SELECT JOIN UNION ALL and MINUS statements Aggregation operators CONSOLIDATE BLOCK
34 AQL Eclipse Tools Overview AQL Editor Ease of Programming AQL Editor: syntax highlighting, auto-complete, AQL Editor: syntax highlighting, auto-complete, hyperlink navigation hyperlink navigation Result Viewer: visualize/compare/evaluate Result Viewer: visualize/compare/evaluate Explain: show how each result was generated Explain: show how each result was generated Workflow UI: enable novice users to become Workflow UI: enable novice users to become experts in a short time experts in a short time Result Viewer Explain Automatic Discovery Pattern Discovery: identify patterns in the data Pattern Discovery: identify patterns in the data Regex Generator: generate regular Regex Generator: generate regular expressions from examples expressions from examples Regex Learner Performance Tuning Profiler: identify performance bottlenecks to be Profiler: identify performance bottlenecks to be hand tuned hand tuned
35 The SystemT Optimizer AQL Language Optimizer Compiled Plan Many different possible ways to execute a given set of AQL statements The SystemT Optimizer chooses a good plan from among the alternatives Employs multiple techniques AQL rewrite rules Cost-based optimization Global plan rewrite rules Input Document BigInsights Cluster Extracted objects
36 What is an operator graph? Input Document AQL Language Optimizer Compiled Plan BigInsights Cluster Extracted objects An extractor plan is a graph of operators Operator: A module that performs a specific task, like identifying matches of a regular expression on a string The output of one operator becomes the input of another
37 Example Operator Graph Output Tuples ApplyFunc Join create view Number as extract regex /\d+/ on between 1 and 1 tokens in D.text as match from Document D; create view Unit as extract dictionary UnitDict on D.text as match from Document D; Dictionary Regex create view AmountWithUnit as extract pattern <N.match> <U.match> return group 0 as match from Number N, Unit U; Document Text
38 Example Optimization: Conditional Evaluation (CE) Leverage document-at-atime processing Don t evaluate the inner operand of a join if the outer has no results John Smith at NLJoin John Smith Dictionary Regex Don t evaluate this Regex when there are no dictionary matches. John Smith at
39 Profiler Output: Hot Views Top 25 Views by Execution Time: View Name Samples Seconds % of Time DateISO DateISOExtended CodeCharNumSymBaseUnfilered DateNormalized Time4Follows3\u2761subquery Views whose compiled plans are responsible for the largest fraction of execution time The view at the bottom of the list is the most expensive
40 Summary Social Media Analytics covers Content, People and Relationships found in Social Media Data Social Media Analytics is relevant to several enterprise users across PR, Marketing, Sales, Product management, Brand Strategy Different KPIs are relevant to different people it s not always about sentiment Requires more than just a set of technology blocks One key technology for Social Media Analysis is Text Analysis Content-level insights like brands, products, features, sentiment Author-level insights like location, demographics, behavior We re ALWAYS interested in interns: 05.ibm.com/employment/de/studenten/jobs/jobs_prakti_software.html
41 Additional Information IBM Social Media Analytics IBM Social Media Analytics product videos Integrating social data with BI and Predictive Analytics 41
Text Analytics: The Victory Index Report SAS VICTORY Index d o u b l e v i c t o r Fern Halper, Ph.D Partner and Principal Analyst Marcia Kaufman COO and Principal Analyst Daniel Kirsh Senior Analyst Table
WHITEPAPER Big data in banking for marketers How to derive value from big data B NK 2020 PAGE 1 INNOVATION LAB INNOVATION LAB FOREWORD In Marketing & Sales the main strategic goals are to acquire new customers,
Compliments of 2nd IBM Limited Edition Business Analytics in Retail Learn to: Put knowledge into action to drive higher sales Use advanced analytics for better response Tailor consumer shopping experiences
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
White paper Customer Relationship.. Management.. Improving customer interactions with this powerful technology Executive Summary As we move further into an era when the manipulation and assessment of data
TDWI research First Quarter 2014 BEST PRACTICES REPORT Predictive Analytics for Business Advantage By Fern Halper Co-sponsored by: tdwi.org TDWI research BEST PRACTICES REPORT First Quarter 2014 Predictive
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
Contents Introduction... 2 Overview of Best Practices... 2 Focus on Your Core Metrics... 2 Meaningful Business Metrics... 3 The 80/20 Rule... 4 Common Metrics... 4 Timely Data... 6 Report Segmentation...
a report by harvard business review analytic services The New Conversation: Taking Social Media from Talk to Action Sponsored by Conventional marketing wisdom long held that a dissatisfied customer tells
03 Market Research What s inside: We begin with an introduction, and then it s into the key terms and concepts of market research, quantitative and qualitative research, how to go about gathering data,
TDWI RESE A RCH TDWI BEST PRACTICES REPORT THIRD QUARTER 2011 SELF-SERVICE BUSINESS INTELLIGENCE Empowering Users to Generate Insights By Claudia Imhoff and Colin White CO-SPONSORED BY tdwi.org Third QUARTER
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
COMMUNITY EBOOK / FEBRUARY 2012 / www.radian6.com / 1 888 6radian Social Media Strategy for Government Organizations www.radian6.com Copyright 2012 - Radian6 Technologies Government Industry Ebook / 2012
CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth
DRAFT VERSION Big Data privacy principles under pressure September 2013 2 Contents Summary... 6 1 Introduction... 8 1.1 Problems for discussion... 8 1.2 Definitions... 9 1.2.1 Big Data... 9 1.2.2 Personal
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
Social Media in Recruiting Using New Channels To Source Talent Benchmark Research White Paper Aligning Business and IT To Improve Performance Ventana Research 2603 Camino Ramon, Suite 200 San Ramon, CA
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
IBM Software Business Analytics Retail Delivering a Smarter Shopping Experience with Predictive Analytics: Innovative Retail Strategies Delivering a Smarter Shopping Experience with Predictive Analytics:
dimension data s 2013/14 global contact centre benchmarking summary report 2013/14 Copyright notice Dimension Data 2009 2013 Copyright and rights in databases subsist in this work. Any unauthorised copying,
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
Digital Shopper Relevancy Research Report 2014 OPEN @ Understanding and anticipating the changing perspectives of Digital Customers Introduction» Understanding Shopper Psychology» Digital Shopping Evolution
WHITE PAPER Turning Insight Into Action The Journey to Social Media Intelligence Turning Insight Into Action The Journey to Social Media Intelligence From Data to Decisions Social media generates an enormous