CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING



Similar documents
Why are Organizations Interested?

IBM Content Analytics with Enterprise Search, Version 3.0

Safe Harbor Statement

TEXT ANALYTICS INTEGRATION

Quality Data for Your Information Infrastructure

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

Data First Framework. How to Build Your Enterprise Data Hub. Luis Campos Big Data Solutions Director Oracle Europe, Middle East and Africa

Interactive product brochure :: Nina TM Mobile: The Virtual Assistant for Mobile Customer Service Apps

Languages Supported. SpeechGear s products are being used to remove communications barriers throughout the world.

Paper Downtime of a truck = Truck repair end date - Truck repair start date

Social Media Implementations

MT Search Elastic Search for Magento

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Discover How a 360-Degree View of the Customer Boosts Productivity and Profits. eguide

PRICE LIST. ALPHA TRANSLATION AGENCY

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services

webcertain Recruitment pack Ceri Wright [Pick the date]

IBM SPSS Modeler Premium

Voice. listen, understand and respond. enherent. wish, choice, or opinion. openly or formally expressed. May Merriam Webster.

Solve your toughest challenges with data mining

Solve Your Toughest Challenges with Data Mining

AccuRead OCR. Administrator's Guide

SUBSCRIPTION AND SaaS FEATURES

Predictive Analytics: Turn Information into Insights

Web Conferencing Comparison Guide

HP Backup and Recovery Manager

Multichannel Customer Listening and Social Media Analytics

SAP For Insurance A focus on Billing and Collections. Robert Schwartz Industry Principal

Liquid OS X User Guide

LANGUAGE CONNECTIONS YOUR LINGUISTIC GATEWAY

2015 Workshops for Professors

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Tel: Fax: P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

We Answer All Your Localization Needs!

At a recent industry conference, global

NICE MULTI-CHANNEL INTERACTION ANALYTICS

Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics

HP Business Notebook Password Localization Guidelines V1.0

We Answer To All Your Localization Needs!

STAR WARS AND THE ART OF DATA SCIENCE

INTERC O MBASE. Global Language Solution

Live Office. Personal Archive User Guide

RC GROUP. Corporate Overview

Linking the world through professional language services

Solve your toughest challenges with data mining

Personal Archive User Guide

Who We Are. Services We Offer

TRADING CENTRAL INDICATOR FOR METATRADER USERS GUIDE. Blue Capital Markets Limited All rights reserved.

IBM Customer Experience Suite and Predictive Analytics

Hexaware E-book on Predictive Analytics

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Maintaining a Competitive Edge with Interaction Analysis

Text Mining and Analysis

USER GUIDE: Trading Central Indicator for the MT4 platform

Overview, Goals, & Introductions

GET YOUR START MENU BACK IN MICROSOFT WINDOWS SERVER 2012

Actuate Business Intelligence and Reporting Tools (BIRT)

Release Notes MimioStudio Software

Echo Backup Software. Quick Start Guide

Real World Application and Usage of IBM Advanced Analytics Technology

SAP BusinessObjects Edge BI, Standard Package Preferred Business Intelligence Choice for Growing Companies

Voice Mail. Service & Operations

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

Translution Price List GBP

Data-Driven Decisions: Role of Operations Research in Business Analytics

FOREIGN LANGUAGE AND AREA STUDIES (FLAS) FELLOWSHIP For Graduate Students Academic Year

Analytics of Textual Big Data Text Exploration of the Big Untapped Data Source

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

LocaTran Translations Ltd. Professional Translation, Localization and DTP Solutions.

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Infor M3 Report Manager. Solution Consultant

Professional. Accurate. Fast.

SAP BusinessObjects EDGE BI WITH DATA MANAGEMENT CENTRALIZE DATA QUALITY FUNCTIONALITY. SAP Solutions for Small Businesses and Midsize Companies

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Analytics-driven Workforce Optimization

ViewSync ViewSync Wireless Presentation Gateway Dongle

SMART Software for Mobile Devices Sales brief

placing people first SALARY REPORT Summary of 2014 Bratislava

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Knowledge of Foreign Languages in the Czech Republic

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

SIN 382-1/1RC Translation Services SIN 382-2/2RC Interpretation Services Contract Number: GS-10F-034AA

LifeStyle Targeting on Big Data using Rapid Miner

Remote Desktop Services Guide

New Features SMART Sync Collaboration Feature Improvements

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Speaking your language...

Beyond listening Driving better decisions with business intelligence from social sources

Three proven methods to achieve a higher ROI from data mining

Transcription:

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc.

Is there valuable information locked away in your unstructured data? 2

CURRENT SITUATION: COMMON QUESTIONS ABOUT TEXTUAL DATA SOURCES Are there hidden insights within text data sources that can help my organization? Such as call center notes, emails, news, government filings, social media How can I leverage on our textual data sources? What value can it bring? Can I also use text data to analyze and predict the future? To reduce fraud, reduce churn, improve sales, reduce costs How can I leverage on both unstructured and structured data sources? Customer data + Customer feedback? Need to leverage the most from text data!

WHAT IF YOU COULD. Extract key information from text data? e.g. people, places, companies See how things are related to each other? Across a large number of documents and messages? Discover main ideas/ topics across all documents and messages Find patterns across non/text data, that can predict the future

WHAT IF YOU COULD Discover new insights from large text data sources Extract key patterns from text data to predict the future Customers Discover current topics about your products from customer opinions Find patterns within customer feedback, that predicts good interest in upsell opportunities Fraud Public Opinion Detect anomalies from usual topics described in text reports, text applications or feedback Understand previously unknown issues/ concerns, from citizen discussions on twitter/ forums Find patterns in reports that may seem to predict/ relate to suspicious behavior Extract key opinions from citizen feedback to forecast citizen sentiments in the near future

WHERE IS TEXT MINING USED? Text Mining has numerous applications in any industry Government Detect fraudulent activity. Spot emerging trends and public concerns. Finance Retention of current customer base using call center transcriptions or transcribed audio. Identification of potentially fraudulent activities. Insurance Identify fraudulent claims. Track competitive intelligence. Brand management Retail Manufacturing Telecommunications Life Sciences Identify the most profitable customers and the underlying reasons for their loyalty. Brand management Reduce time to detect root cause of product issues. Identify trends in market segments. Help prevent churn and suggest up-sell/cross-sell opportunities for individual customers. Identify adverse events. Recommend appropriate research materials.

TEXT MINING

SAS Text Analytics Domain-Driven Information Organization and Access Analysis-Driven Predictive Modeling, Discover Trends and Patterns SAS Enterprise Content Categorization SAS Ontology Management SAS Text Miner SAS Sentiment Analysis

SAS TEXT MINER Is a complete solution, to discover insights or predict behaviour and outcomes by leveraging on data mining capabilities of SAS Enterprise Miner and SAS natural language processing (NLP)/ advanced linguistic technologies. What is Concept Extraction? To automatically locate and extract the key information from documents based on the rules & advanced linguistic logic What is Concept Linking? To look within a large corpus of text documents to discover how concepts/ key information are associated/ linked with each other. What is Topic Discovery? To analyse a large corpus of text documents to discover topics by grouping messages that has very similar content.

HOW DOES TEXT MINING WORK? EXPLORING & DISCOVERING INSIGHTS 1. Input text messages e.g. twitter data, reports, email, news, forum messages 2. Parse & explore Text Data break down text and explore relationships of key concepts such as persons, places, organizations 3. Discover Topics cluster documents of similar content and describe them with important key words

HOW DOES TEXT MINING WORK? DISCOVER PATTERNS FOR PREDICTIVE MODELING 1. Input text messages with relevant structured data e.g. email, call center notes, applications 2. Parse Text Data and Discover Topics Break down text into structured data, group messages of similar content 3. Predictive Modeling with text data text data input into models may provide reliable info to predict outcome & behavior Customer data Predict activity that is likely fraudulent

WHAT CAN WE DISCOVER? Discover relationships between concepts described in large corpus of text data how are persons, places, organizations related? Discover topics mentioned in text data what are main topics mentioned? What are the rare topics? Discover patterns related to structured data e.g. how is feedback related to customer purchase behavior?

EXAMPLE DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA From customer complaints to engineer logs to legal documents, it is a considerable challenge to draw insights from large amounts of information, and usually unfeasible via manual means. This is even more difficult when we wish to detect concepts and patterns within the documents, in order to find trends and detect high risk events How can we analyse millions of documents quickly and identify key patterns and cases of high risk? (e.g. risk of fraudulent activity) THE DRIVER SIDE SEAT BELT SOMETIMES FAILS TO RETRACT. WHEN I PULLED THE BELT OUT, IT STAYED OUT AND WOULD NOT RETRACT. I INSPECTED THE AREA AND FOUND NO INTERFERENCE. THIS HAPPENED ON A SAT. I DROVE THE VEHICLE SAT. AND SUN WITH A FAULTY BELT. I CALLED THE DEALERS SERVICE DEPT. TOLD THEM THE PROBLEM BUT COULDN'T GET IN FOR A WEEK.

EXAMPLE DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA SAS Text Miner automates manual comprehension of text documents, uncovering relationships and trends of concepts mentioned across documents, allowing drill down analysis and integrated with predictive modeling within SAS Enterprise Miner. In this example, we look at a large database of car faults Here, SAS Text Miner runs a Text Parsing processing on thousands of reports of car faults Recognizing and extracting entities and parts of speech Supporting a wide range of languages Into a detailed term/ document matrix Allowing us deeper analysis/ visualization of insights Car Fault Records THE DRIVER SIDE SEAT BELT SOMETIMES FAILS TO RETRACT. WHEN I PULLED THE BELT OUT, IT STAYED OUT AND WOULD NOT RETRACT. I INSPECTED THE AREA AND FOUND NO INTERFERENCE

EXAMPLE DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA This allows us to discover relationships between concepts across all messages e.g. what is usually mentioned with issues such as brake problems? Discover topics mentioned in text data e.g. Understand the main topics: dealerships Uncover the emerging topics: Battery issues Discover patterns related to structured data e.g. Complaints on engine trouble have a higher chance of car accidents

EXAMPLE DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA How does this help? Discovery of new insights/ topics: Text data forum messages, emails, logs, records typically contain rich, yet sparse/ uncommon insights. Text mining allows you to: Parse and extract information from text data Reliably filter and retain important information Automatically group documents into similar topics, allowing discovery of important/ large topics or rare/ small topics Text mining input in Predictive modeling: Documents and records often contain important facts that can reliably predict outcomes for e.g. any mention of bad maintenance habits will likely result in earlier car failure Empowered by SAS Natural Language Processing and wide multi language support, Text mining discovers key trends within large amounts of text, to be used as clean, reliable input in data mining analysis.

BENEFITS SAS Text Miner helps your organization to: Uncover previously undetected associations and relationships Get a complete view data, and drill down to specific documents for more insight Automate time-consuming tasks of reading and understanding text. Analyse both text and non-text data produce predictive models that spot more opportunities and recognize trends more accurately Discover hidden patterns from text data for insights and predictive modeling!

SAS TEXT MINER

SAS TEXT MINER ANALYTICAL WORKFLOW Text Mining Raw Data Model with Structured and Unstructured Data

EXAMPLE TEXT MINING PROCESS FLOWS

EXAMPLE TEXT MINING PROCESS FLOWS Start with a table that contains either: - Documents saved as a variable (column) - A column that points to physical text files

EXAMPLE INPUT DATA VARIABLE CONTAINS FULL TEXT

EXAMPLE INPUT DATA VARIABLE CONTAINS POINTER TO TEXT FILE

EXAMPLE TEXT MINING PROCESS FLOWS Apply natural language processing algorithms to parse the documents and quantify information about the terms in the corpus.

TEXT PARSING NODE Tokenization - break sentences or documents into terms Stemming - identify the root form of a word (run, runs, running, ran, etc.) Synonyms Remove low-information words such as a, an, and the (stop list) Part of speech identification (noun, verb, etc.) Identify Standard and Custom Entities (names, places, etc.) Multiword terms or phrases ( blue screen of death ) Import custom entities, facts, and events as defined in SAS Enterprise Content Categorization (ECC) Include negation entities from SAS ECC for Sentiment Analysis

SUPPORTED LANGUAGES Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Spanish, and Swedish, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Indonesian, Norwegian, Romanian, Russian, Slovak, Thai, Turkish, Vietnamese, Russian, Greek, Vietnamese, Turkish, Czech, Indonesian, Thai, Danish, Norwegian, Slovak, Finnish, Romanian, Hebrew, Hungarian, Korean New in SAS 9.3

EXAMPLE TEXT MINING PROCESS FLOWS Perform spell-checking and refine synonym lists. Discover related concepts using Concept Linking. Perform full text search. Subset documents and/or terms for further analysis.

TEXT FILTER NODE Spell checking Concept Linking Full text search Define additional synonyms Sub-setting management of terms and documents that are passed to subsequent nodes

FILTER VIEWER

SAS Text Mining

CONCEPT LINKING

EXAMPLE TEXT MINING PROCESS FLOWS Analyze the documents to create topics and assign each document to one or more topics. In addition to derived topics, users can add their own topic definitions.

TEXT TOPIC NODE Multiple topics per document Soft clustering using rotated SVD (PROC SVD followed by PROC FACTOR) Allows automatic creation of single and multi-word topics User defined topics and editing of automatic topics

INTERACTIVE TOPIC VIEWER

EXAMPLE TEXT MINING PROCESS FLOWS Analyze the documents to create clusters and assign each document to a single cluster.

CLUSTER VIEWER

CLUSTER VIEWER

EXAMPLE TEXT MINING PROCESS FLOWS Clusters can be further explored using the Segment Profile node to identify factors that differentiate data segments from the population.

SEGMENT PROFILE The Segment Profile node is available on the Assess tab of Enterprise Miner. It allows the examination of segmented or clustered data to identify factors that differentiate data segments from the population.

SEGMENT PROFILE

EXAMPLE TEXT MINING PROCESS FLOWS: PREDICTION Several methods are available to use the unstructured data to create predictions.

WHERE IS TEXT MINING USED? Text Mining has numerous applications in any industry Government Detect fraudulent activity. Spot emerging trends and public concerns. Finance Retention of current customer base using call center transcriptions or transcribed audio. Identification of potentially fraudulent activities. Insurance Identify fraudulent claims. Track competitive intelligence. Brand management Retail Manufacturing Telecommunications Life Sciences Identify the most profitable customers and the underlying reasons for their loyalty. Brand management Reduce time to detect root cause of product issues. Identify trends in market segments. Help prevent churn and suggest up-sell/cross-sell opportunities for individual customers. Identify adverse events. Recommend appropriate research materials.

BENEFITS SAS Text Miner helps your organization to: Uncover previously undetected associations and relationships Get a complete view data, and drill down to specific documents for more insight Automate time-consuming tasks of reading and understanding text. Analyse both text and non-text data produce predictive models that spot more opportunities and recognize trends more accurately Discover hidden patterns from text data for insights and predictive modeling!

LEARNING MORE

SAS TEXT MINER RESOURCES SAS Text Miner Product Web Site http://www.sas.com/text-analytics/text-miner/index.html SAS Text Miner Technical Support Web Site http://support.sas.com/software/products/txtminer/index.html SAS Text Miner Technical Forum (Join Today!) https://communities.sas.com/community/supportcommunities/sas_data_mining_and_text_mining SAS Training Data Miner Training Path: http://support.sas.com/training/us/paths/dm.html Courses for SAS Text Miner: https://support.sas.com/edu/prodcourses.html?code=tm&ctry=us

Step-bystep how-to guide http://support.sas.com/documentation/onlinedoc/txtminer/index.html

Data for the step-bystep how-to guide

DISCUSSION FORUMS http://communities.sas.com

DISCUSSION FORUMS https://communities.sas.com/community/support-communities/text-analytics

COMPLIMENTARY ON-DEMAND WORKSHOPS http://www.sas.com/reg/offer/corp/handson

THANK YOU FOR USING SAS! www.sas.com