Text Analytics Evaluation Case Study - Amdocs

Similar documents
Text Analytics Software Choosing the Right Fit

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

Taxonomy Tools Requirements and Capabilities. Joseph A Busch, Project Performance Corporation Zachary R Wahl, Project Performance Corporation

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

Choosing a vendor: things to consider

Tools for Taxonomies. Heather Hedden Taxonomy Consultant Hedden Information Management Heather Hedden

Auto-Classification for Document Archiving and Records Declaration

Best Practices for Architecting Taxonomy and Metadata in an Open Source Environment

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Search and Information Retrieval

Taxonomies in Practice Welcome to the second decade of online taxonomy construction

Get the most value from your surveys with text analysis

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Text Analytics: The Hurwitz Victory Index Report

Flattening Enterprise Knowledge

IBM SPSS Text Analytics for Surveys

Enterprise Content Categorization The Business Strategy for a Semantic Infrastructure

The Ontological Approach for SIEM Data Repository

Analyzing survey text: a brief overview

A collaborative platform for knowledge management

Survey Results: Requirements and Use Cases for Linguistic Linked Data

IT services for analyses of various data samples

Why are Organizations Interested?

An Introduction to SAS Enterprise Miner and SAS Forecast Server. André de Waal, Ph.D. Analytical Consultant

Webinar. Feb

How To Use Semantics In A System

KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore

REPUTATION RISK, FACTORS & ANALYSIS PROVIDED BY SAS OPRISK GLOBAL DATA

Vertical Data Warehouse Solutions for Financial Services

IBM System x reference architecture solutions for big data

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Semaphore Overview. A Smartlogic White Paper. Executive Summary

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

USING TAXONOMIES AS MASTER REFERENCE METADATA FOR THE ENTERPRISE

IBM SPSS Modeler Premium

The Value of Taxonomy Management Research Results

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

PROMT Technologies for Translation and Big Data

how to gain insight from text

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com

IBM Security IBM Corporation IBM Corporation

Architecture Roadmap: Business Process Modeling (BPM) Recommendation

The ADOxx Metamodelling Platform Workshop "Methods as Plug-Ins for Meta-Modelling" in conjunction with "Modellierung 2010", Klagenfurt

Certified Information Professional (CIP) Certification Maintenance Form

Text Mining - Scope and Applications

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

A Survey on Product Aspect Ranking

Profiling and Testing with Test and Performance Tools Platform (TPTP)

Request for Information Page 1 of 9 Data Management Applications & Services

Not all NLP is Created Equal:

Welcome to online seminar on. Oracle PIM Data Hub. Presented by: Rapidflow Apps Inc

Publishing Linked Data Requires More than Just Using a Tool

Semantic Stored Procedures Programming Environment and performance analysis

Questions to Ask When Selecting Your Customer Data Platform

Master Data Management Architecture

Enhancing Lotus Domino search

Content Delivery Service (CDS)

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Text Mining and its Applications to Intelligence, CRM and Knowledge Management

whitepaper critical software characteristics

How To Make Sense Of Data With Altilia

Automating Attack Analysis Using Audit Data. Dr. Bruce Gabrielson (BAH) CND R&T PMO 28 October 2009

Angoss Predictive Analytics Software Suite

QUALITY CONTROL PROCESS FOR TAXONOMY DEVELOPMENT

Towards an On board Personal Data Mining Framework For P4 Medicine

Social Media Implementations

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM

Take full advantage of IBM s IDEs for end- to- end mobile development

D5.3.2b Automatic Rigorous Testing Components

ByteMobile Insight. Subscriber-Centric Analytics for Mobile Operators

SELECTION OF AN ORGANIZATION SPECIFIC ERP

Armanino McKenna LLP Welcomes You To Today s Webinar:

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Provalis Research Text Analytics and the Victory Index

How To Use Spagobi Suite

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

Disrupting The Market: Predictive Analytics As A Service

Transcription:

Text Analytics Evaluation Case Study - Amdocs Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World October 20 New York

Agenda Introduction Text Analytics Basics Evaluation Process & Methodology Two Stages Initial Filters & POC Initial Evaluation Results Proof of Concept Methodology Results Final Recommendation Conclusions 2

Introduction to Text Analytics Text Analytics Features Noun Phrase Extraction Catalogs with variants, rule based dynamic Multiple types, custom classes entities, concepts, events Feeds facets Summarization Customizable rules, map to different content Fact Extraction Relationships of entities people-organizations-activities Ontologies triples, RDF, etc. Sentiment Analysis Rules Objects and phrases 3

Introduction to Text Analytics Text Analytics Features Auto-categorization Training sets Bayesian, Vector space Terms literal strings, stemming, dictionary of related terms Rules simple position in text (Title, body, url) Semantic Network Predefined relationships, sets of rules Boolean Full search syntax AND, OR, NOT Advanced NEAR (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction If any of list of entities and other words 4

Evaluation Process & Methodology Start with Self Knowledge Think Big, Start Small, Scale Fast Eliminate the unfit Filter One- Ask Experts - reputation, research Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard minimum, must have, filter to top 3 Filter Two Technology Filter match to your overall scope and capabilities Filter not a focus Filter Three In-Depth Demo 3-6 vendors Deep POC (2) advanced, integration, semantics Focus on working relationship with vendor. 5

Evaluation Process & Methodology Amdocs Requirements / Initial Filters Platform range of capabilities Categorization, Sentiment analysis, etc. Technical API s, Java based, Linux run time Scalability millions of documents a day Import-Export XML, RDF Total Cost of Ownership Vendor Relationship - OEM Usability, Multiple Language Support 6

Vendors of Taxonomy/ Text Analytics Software Attensity Business Objects Inxight Clarabridge ClearForest Concept Searching Data Harmony / Access Innovations Expert Systems GATE (Open Source) IBM Infosphere Lexalytics Multi-Tes Nstein SAS - Teragram SchemaLogic Smart Logic Synaptica 7

Initial Evaluation 4 Demos Smartlogic Taxonomy Management, good interface 20 types of entities, API s, XML-Http Full Platform no Sentiment Analysis Expert Systems Different Approach Semantic Network 400,000 words / 3,500 rules, 65 types of relationships Strong out of the box 80%, no training sets Language concerns no Spanish, high cost to develop new ones Customization add terms and relationships, develop rules uncertain how much effort, use their professional linguists 8

Initial Evaluation 4 Demos SAS-Teragram Full Platform categorization, entity, sentiment integrated API s, XML, Java ease of integration Strong history of company, range of experience IBM Classification, Concept Analytics Two products Classification Module statistical emphasis Once trained, it could learn new words Rapid development / depends on training sets Content Analytics, Languageware Workbench Full Platform 9

Initial Evaluation Findings SAS & IBM Full Platform, OEM Experience, multilingual Proven ability to scale, customizable components, mature tool sets SAS was the strongest offering Capabilities, experience, integrated tool sets IBM good second choice Capabilities, experience - multiple products strength and weakness Single Vendor POC - Demonstrate it can be done Ability to dive more deeply into capabilities, issues Stronger foundation for future development, Learn the software better Danger of missing better choice Two Vendor POC Balance of depth and full testing 10

Phase II - Proof Of Concept - POC Measurable Quality of results is the essential factor 4 weeks POC bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME s as test evaluators also to do an initial categorization of content Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities have to determine at POC time Taxonomy Developers expert consultants plus internal taxonomists 11

POC Design: Evaluation Criteria & Issues Basic Test Design categorize test set Score by file name, human testers Categorization & Sentiment Accuracy 80-90% Effort Level per accuracy level Quantify development time main elements Comparison of two vendors how score? Combination of scores and report Quality of content & initial human categorization Normalize among different test evaluators Quality of taxonomists experience with text analytics software and/or experience with content and information needs and behaviors Quality of taxonomy structure, overlapping categories 12

Text Analytics POC Outcomes Categorization of CSR Notes Content 2,000 CSR notes categorized by humans Variation among human categorization Recall (finding all the correct documents) Precision (not categorizing documents from other categories) Precision is harder than recall Two scores raw and corrected only raw for IBM precision First score was very low, with an extra round got it up Uncategorized documents 50,000 look at top 10 in each category 13

Text Analytics POC Outcomes Categorization Results SAS IBM Recall-Motivation 92.6 90.7 Recall-Actions 93.8 88.3 Precision Mot. 84.3 Precision-Act 100 Uncategorized 87.5 Raw Precision 73 46 14

Text Analytics POC Outcomes Vendor Comparisons SAS has a much more complete set of operators NOT, DIST, START IBM team was able to develop work arounds for some more development effort Operators impact most other features Sentiment analysis, Entity and Fact Extraction, Summarization, etc. SAS has relevancy can be used for precision, applications Sentiment Analysis SAS has workbench, IBM would require more development SAS also has statistical modeling capabilities Development Environment & Methodology IBM as toolkit provides more flexibility but it also increases development effort, enforces good method 15

Text Analytics POC Outcomes Vendor Comparisons - Conclusions Both can do the job Product vs. Tool Kit (SAS has toolkit capabilities also) IBM will require more development effort Boolean Operators NOT, DIST, START, etc. In rules, entity and fact extraction Sentiment Analysis rules, statistical Summarization Rule building more programming than taxonomy IBM harder to learn POC had 2X effort for IBM 16

Text Analytics Evaluation Conclusions Start with Self Knowledge text analytics not an end in itself Initial Evaluation filters, not scorecards Weights change output need self knowledge for good weights Proof of Concept essential OOB doesn t tell you how it will work in real world Content and Scenarios is your real world Importance of operators, relevance for a platform Sentiment needs full platform Everyone has room for improvement 17

Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com