SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions and Providers Seth Grimes 2 1
Integration of Risk Management Source Global Risk Management Survey Sixth Edition by Deloitte in 2009 3 Unstructured and Semi-structured Data The Dark Matter for IT Unstructured data Structured data 25% 5% 70% Semistructured data 4 2
Text Analytics Basics Most everything people do with electronic documents falls into one of four classes: 1. Compose, publish, manage, and archive 2. Index and Search 3. Categorize and classify according to metadata and contents 4. Summarize and extract information Text Analytics 2009, Seth Grimes, An Alta Plana Research Study 5 How do you know? Vice Chancellor Samuel Ray Jones of North Carolina State University announced that his left arm had been severed accidently in a bazaar incident as he left his vehicel. 6 3
SAS Text Analytics Information Organization and Access Predictive Modeling, Discover Trends and Patterns SAS Enterprise Content Categorization SAS Ontology Management SAS Text Miner SAS Sentiment Analysis 7 SAS Text Analytics Integration of Text Mining, Sentiment Analysis and Content Categorization SAS Text Miner Explore large volumes of text Concept Linking Clustering Merge with structured data for Segment Profiling Prediction Natural Language Processing Part-of-speech tagging Stemming Tokenization Phrase Recognition Entity Extraction 30+ languages SAS Sentiment Analysis Identifies overall and feature level sentiment Combines statistical models and business rules Automatically scores sentiment of new documents SAS Content Categorization Adds Metadata to Content for easier search and retrieval Builds Taxonomies Through Rules Engine Automatically categorizes incoming documents 8 4
Language Detection Cumulative Cumulative Language Frequency Percent Frequency Percent Arabic 2 0.07 2 0.07 Chinese (simplified) 5 0.18 9 0.33 Danish 3 0.11 12 0.44 Dutch 20 0.73 32 1.16 English 2398 86.98 2430 88.14 French 19 0.69 2449 88.83 German 20 0.73 2469 89.55 Italian 10 0.36 2479 89.92 Japanese 32 1.16 2511 91.08 Korean 35 1.27 2546 92.35 Norwegian 2 0.07 2548 92.42 Polish 1 0.04 2549 92.46 Portuguese 131 4.75 2680 97.21 Spanish 75 2.72 2755 99.93 Swedish 2 0.07 2757 100 9 What is Content Categorization? Often used in conjunction with enabling better SEARCH More relevant search is facilitated by creating taxonomies for content, associating metadata with the content, and automating the process to increase findability. Consistency with Automation - content tagging is often manual, redundant, and error-prone Classifying, tracking, and reporting of topics How many documents were classified in these topic areas? Or mention these people or places? How many times are drugs mentioned with these side-effects? Is this changing over time? 12 5
ECC Example - New York Times Topics Pages Automatically organize your Content Increase Search Engine Optimization ranking Topics Automatic Entities Extraction Automatic Categorization 14 Social Media = Noisy Data Actual Content of Data Provided by Major Bank Retailer Arts/Sports Romanian Jobs Phishing Actually About Bank Only 38% of the records pulled about the bank had anything to do with banking. Almost 58% of records were definitely in Romanian. This number could be as high as 90% however. 15 6
Reporting on Categories 17 Reporting on Categories 18 7
Ontology Definition A mapping of relationships Way of organizing information across different fields or classification systems Means of creating shared vocabulary and generating consistency across units Integrating a Collection of Taxonomies Potential Business Uses Consolidation of vocabularies across departments Mergers and Acquisitions Enhancement of search Additional structure with metadata 28 Complexity of Ontologies Ontologies range from simple taxonomies to highly tangled networks including constraints associated with concepts and relationships. Light-weight concepts is-a hierarchy among concepts relations between concepts Heavy-weight cardinality constraints taxonomy of relations Axioms (restrictions) 29 8
Example: People Ontology 30 SAS Ontology Management Ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. Ontologies contain: Classes are groups of objects Categories or Concepts from Content Categorization projects Slots are the metadata attributes Link the rules across the concepts and are universal to each taxonomy, regardless of which project the taxonomy is stored in Instances are specific objects Assigned to the various Categories and Concepts Value Restrictions Allowable values of attributes and relationships (of slots) 32 9
Example: An Ontology for dogs Classes: dog, poodle, terrier, collie, pit bull, chihuahua, Slots: fur color, fur length, size, number of legs, region of origin,... Value restrictions (on slots): Fur length = short, medium, long Number of legs =< 4 Instances: Lassie, Petey, Gidget 33 The Cat The Vet and Grandma associate different views for the concept cat. 35 10
What is Sentiment Analysis? A process that identifies, analyzes, and interprets the attitudes, opinions, and emotions in digital content Statistical Rules Based Hybrid Leverage both advanced analytics and human expertise 36 How is Sentiment Analysis Used? Often Sentiment Analysis is used in conjunction with evaluating the customer experience. } Surveys Call Center notes Unstructured Text Social data Chat sessions Hotel Experience Service Value Bathroom Beds Room Size Lobby Concierge Restaurants Check In / Out Fitness Center Structured Data Area of the Country North East South West Traveler Type Business Personal Hotel Type Luxury Standard Economy 37 11
SAS Text Miner Text Mining is the process of analyzing a corpus of documents, through Natural Language Processing and statistical methods, to uncover topics hidden within the documents 50 Two General Goals of Text Mining Exploration Uncovering hidden themes and key concepts Concept Linking Clustering Prediction Classification Identify which input variables are most influential to the value of a target variable Scoring - Derive a model or set of rules that produces a predicted target value for a given set of inputs 51 12
Identify and count word occurrences 52 What are Concept Links? The strength of association of two terms is computed and visually represented as a Concept Link 53 13
What are Clusters? Clustering involves finding groups of documents that are more similar to each other than they are to the rest of the documents in the collection. Once the clusters are determined, examining the words that occur in the cluster reveals the focus of the cluster. 54 SAS Sentiment Analysis Workbench Creates Word or Phrase Clouds Data exploration and visualization 66 14
Compare Sentiment of Specific Features of Your Products vs the Competition Output from SAS Sentiment Analysis can be input to SAS BI for greater depth and flexibility of reporting. 67 The Synergy of SAS Text Analytics The value of the individual SAS Text Analytics solutions is greatly enhanced when the solutions are used together to gain even greater insight. Examples: Enhancing the value of topics discovered and defined in SAS Enterprise Content Categorization by adding sentiment to them Enhancing predictive modeling by adding sentiments discovered using SAS Sentiment Analysis 68 15
SAS Sentiment Analysis and SAS Content Categorization Used in Conjunction Taxonomies can be highly customized for each customer to ensure best alignment and accuracy SAS Content Categorization can be used to further clean, filter, and organize the raw data SAS measures both document-level and attribute-level sentiment using a hybrid of statistical and rules based methods 69 Predict Sentiment or NPS Scores Using New Sentiment Variables 70 16
Decision Tree With Sentiment Variables New variables derived from SAS Sentiment Analysis turned out to be highly predictive in the decision tree, adding more lift SAS Text Analytics Information Organization and Access Predictive Modeling, Discover Trends and Patterns SAS Enterprise Content Categorization SAS Ontology Management SAS Text Miner SAS Sentiment Analysis 72 17
Thank you for being a valued SAS customer! 18