Text Analytics Software Choosing the Right Fit



Similar documents
Text Analytics Evaluation Case Study - Amdocs

Taxonomy Tools Requirements and Capabilities. Joseph A Busch, Project Performance Corporation Zachary R Wahl, Project Performance Corporation

Best Practices for Architecting Taxonomy and Metadata in an Open Source Environment

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

Auto-Classification for Document Archiving and Records Declaration

Data Search. Searching and Finding information in Unstructured and Structured Data Sources

Enterprise Content Categorization The Business Strategy for a Semantic Infrastructure

Taxonomies in Practice Welcome to the second decade of online taxonomy construction

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Tools for Taxonomies. Heather Hedden Taxonomy Consultant Hedden Information Management Heather Hedden

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Choosing a vendor: things to consider

How To Make Sense Of Data With Altilia

BearingPoint How to take your Performance Management to the next level

The Value of Taxonomy Management Research Results

Request for Information Page 1 of 9 Data Management Applications & Services

Text Mining - Scope and Applications

Configuring SharePoint 2013 Document Management and Search. Scott Jamison Chief Architect & CEO Jornata scott.jamison@jornata.com

IT services for analyses of various data samples

Oracle BI Application: Demonstrating the Functionality & Ease of use. Geoffrey Francis Naailah Gora

Reduce Cost, Time, and Risk ediscovery and Records Management in SharePoint

Flattening Enterprise Knowledge

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

USING TAXONOMIES AS MASTER REFERENCE METADATA FOR THE ENTERPRISE

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Integrating GIS within the Enterprise Options, Considerations and Experiences

Delivering Smart Answers!

Analyzing survey text: a brief overview

Session 2: Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture

Text Mining and its Applications to Intelligence, CRM and Knowledge Management

Using Master Data in Business Intelligence

how to gain insight from text

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

Real World Application and Usage of IBM Advanced Analytics Technology

Implementing Business Intelligence at Indiana University Using Microsoft BI Tools

Text Analytics: The Hurwitz Victory Index Report

KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore

Vertical Data Warehouse Solutions for Financial Services

Survey Results: Requirements and Use Cases for Linguistic Linked Data

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Search and Information Retrieval

Predictive Analytics: Turn Information into Insights

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

BusinessObjects XI. New for users of BusinessObjects 6.x New for users of Crystal v10

Analance Data Integration Technical Whitepaper

Make the right decisions with Distribution Intelligence

Data collection architecture for Big Data

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Cloud Based Document Management

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology

Westernacher Consulting

IBM SPSS Text Analytics for Surveys

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

In-Database Analytics

Big Data and Open Data

Compliance in Office 365 What You Should Know. Don Miller Vice President of Sales Concept Searching

Certified Information Professional (CIP) Certification Maintenance Form

Big Data & Security. Aljosa Pasic 12/02/2015

Get the most value from your surveys with text analysis

Why are Organizations Interested?

IBM SPSS Modeler Premium

III JORNADAS DE DATA MINING

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

Practical meta data solutions for the large data warehouse

Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham

Megaputer Intelligence

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Industry 4.0 and Big Data

Industry Models and Information Server

T HE I NFORMATION A RCHITECTURE G LOSSARY

Information Quality for Business Intelligence. Projects

84% of Migration Projects Fail Getting it Right in SharePoint

QUALITY CONTROL PROCESS FOR TAXONOMY DEVELOPMENT

Social Media Implementations

Elsa C. Augustenborg Gary R. Danielson Andrew E. Beck

How To Use Spagobi Suite

Transcription:

Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World San Francisco, 2013

Agenda Introduction Text Analytics Basics Evaluation Process & Methodology Two Stages Initial Filters & POC Proof of Concept Methodology Results Text Analytics and Text Analytics Conclusions 2

KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants 8-10 Partners SAS, SAP, FAST, Smart Logic, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: Taxonomy/Text Analytics development, consulting, customization Evaluation of Enterprise Search, Text Analytics Text Analytics Assessment, Fast Start Technology Consulting Search, CMS, Portals, etc. Knowledge Management: Collaboration, Expertise, e-learning Applied Theory Faceted taxonomies, complexity theory, natural categories 3

Introduction to Text Analytics Text Analytics Features Noun Phrase Extraction Catalogs with variants, rule based dynamic Multiple types, custom classes entities, concepts, events Feeds facets Summarization Customizable rules, map to different content Fact Extraction Relationships of entities people-organizations-activities Ontologies triples, RDF, etc. Sentiment Analysis Rules Objects and phrases 4

Introduction to Text Analytics Text Analytics Features Auto-categorization Training sets Bayesian, Vector space Terms literal strings, stemming, dictionary of related terms Rules simple position in text (Title, body, url) Semantic Network Predefined relationships, sets of rules Boolean Full search syntax AND, OR, NOT Advanced DIST(#), ORDDIST#, PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction If any of list of entities and other words 5

Case Study Categorization & Sentiment 6

Case Study Categorization & Sentiment 7

8

Evaluation Process & Methodology Overview Start with Self Knowledge Think Big, Start Small, Scale Fast Eliminate the unfit Filter One- Ask Experts - reputation, research Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard minimum, must have, filter to top 3 Filter Two Technology Filter match to your overall scope and capabilities Filter not a focus Filter Three In-Depth Demo 3-6 vendors Deep POC (2) advanced, integration, semantics Focus on working relationship with vendor. 9

Design of the Text Analytics Selection Team Traditional Candidates IT&, Business, Library IT - Experience with software purchases, needs assess, budget Search/Categorization is unlike other software, deeper look Business -understand business, focus on business value They can get executive sponsorship, support, and budget But don t understand information behavior, semantic focus Library, KM - Understand information structure Experts in search experience and categorization But don t understand business or technology 10

Design of the Text Analytics Selection Team Interdisciplinary Team, headed by Information Professionals Relative Contributions IT Set necessary conditions, support tests Business provide input into requirements, support project Library provide input into requirements, add understanding of search semantics and functionality Much more likely to make a good decision Create the foundation for implementation 11

Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge Strategic and Business Context Info Problems what, how severe Strategic Questions why, what value from the text analytics, how are you going to use it Platform or Applications? Formal Process - KA audit content, users, technology, business and information behaviors, applications - Or informal for smaller organization, Text Analytics Strategy/Model forms, technology, people Existing taxonomic resources, software Need this foundation to evaluate and to develop 12

Varieties of Taxonomy/ Text Analytics Software Taxonomy Management Synaptica, SchemaLogic Full Platform SAS, SAP, Smart Logic, Linguamatics, Concept Searching, Expert System, IBM, GATE Embedded Search or Content Management FAST, Autonomy, Endeca, Exalead, etc. Nstein, Interwoven, Documentum, etc. Specialty / Ontology (other semantic) Sentiment Analysis Lexalytics, Clarabridge, Lots of players Ontology extraction, plus ontology 13

Vendors of Taxonomy/ Text Analytics Software Attensity Business Objects Inxight Clarabridge ClearForest Concept Searching Data Harmony / Access Innovations Expert Systems GATE (Open Source) IBM Infosphere Lexalytics Linguamatics Multi-Tes Nstein SAS SchemaLogic Smart Logic Synaptica Temis 14

Initial Evaluation Factors Traditional Software Evaluation - Deeper Basic & Advanced Capabilities Lack of Essential Feature No Sentiment Analysis, Limited language support Customization vs. OOB Strongest OOB highest customization cost Company experience, multiple products vs. platform Ease of integration API s, Java Internal and External Applications Technical Issues, Development Environment Total Cost of Ownership and support, initial price POC Candidates 1-4 15

Initial Evaluation Factors Case Studies Amdocs Customer Support Notes short, badly written, millions of documents Total Cost, multiple languages, Integration with their application Distributed expertise Platform resell full range of services, Sentiment Analysis Twenty to Four to POC (Two) to SAS GAO Library of 200 page PDF formal documents, plus public web site People library staff 3-4 taxonomists centralized expertise Enterprise search, general public Twenty to POC with SAS 16

Phase II - Proof Of Concept - POC Measurable Quality of results is the essential factor 4 weeks POC bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME s as test evaluators also to do an initial categorization of content Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities have to determine at POC time Taxonomy Developers expert consultants plus internal taxonomists 17

POC Design: Evaluation Criteria & Issues Basic Test Design categorize test set Score by file name, human testers Categorization & Sentiment Accuracy 80-90% Effort Level per accuracy level Quantify development time main elements Comparison of two vendors how score? Combination of scores and report Quality of content & initial human categorization Normalize among different test evaluators Quality of taxonomists experience with text analytics software and/or experience with content and information needs and behaviors Quality of taxonomy structure, overlapping categories 18

Text Analytics POC Outcomes Evaluation Factors Variety & Limits of Content Twitter to large formal libraries Quality of Categorization Scores Recall, Precision (harder) Operators NOT, DIST, START, Development Environment & Methodology Toolkit or Integrated Product Effort Level and Usability Importance of relevancy can be used for precision, applications Combination of workbench, statistical modeling Measures scores, reports, discussions 19

POC and Early Development: Risks and Issues CTO Problem This is not a regular software process Semantics is messy not just complex 30% accuracy isn t 30% done could be 90% Variability of human categorization Categorization is iterative, not the program works Need realistic budget and flexible project plan Anyone can do categorization Librarians often overdo, SME s often get lost (keywords) Meta-language issues understanding the results Need to educate IT and business in their language 20

Text Analytics and Text Analytics Text Mining TA is pre-processing for text mining TA adds huge dimensions of unstructured text Now 85-90% of all content, Social Media TA can improve the quality of text Categorization, Disambiguated metadata extraction Unstructured text into data - What are the possibilities? New Kinds of Taxonomies emotion, small smart modular Information Overload search, facets, auto-tagging, etc. Behavior Prediction individual actions (cancel or not?) Customer & Business Intelligence new relationships Crowd sourcing technical support Expertise Analysis documents, authors, communities 21

Conclusion Start with self-knowledge what will you use it for? Current Environment technology, information Basic Features are only filters, not scores Integration need an integrated team (IT, Business, KA) For evaluation and development POC your content, real world scenarios not scores Foundation for development, experience with software Development is better, faster, cheaper Categorization is essential, time consuming Text Analytics opens up new worlds of applications 22

Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com