Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Similar documents
Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Voice. listen, understand and respond. enherent. wish, choice, or opinion. openly or formally expressed. May Merriam Webster.

Text Mining - Scope and Applications

Best practices for evaluating and selecting content analytics tools

Auto-Classification for Document Archiving and Records Declaration

TEXT ANALYTICS INTEGRATION

IBM Content Analytics with Enterprise Search, Version 3.0

BI and the Unstructured Data Challenge

WHITEPAPER. Text Analytics Beginner s Guide

Why are Organizations Interested?

Industry Impact of Big Data in the Cloud: An IBM Perspective

How To Use Social Media To Improve Your Business

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Business IntelliSENSE Social and Internal Data Marriage for Smarter Decisions

Voice of the Customer: Better CRM Through Text and BI Integration

itunes 1.6 Cognitive - The Building Blocks for Smarter Apps

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

An Ontology Based Text Analytics on Social Media

Beyond listening Driving better decisions with business intelligence from social sources

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

IBM Social Media Analytics

The Future of Business Analytics is Now! 2013 IBM Corporation

Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics

New Publishing Frontiers Extracting Value from the Supply Chain

Engage your customers

How To Make Sense Of Data With Altilia

Delivering Smart Answers!

How To Understand The Value Of Big Data

Hexaware E-book on Predictive Analytics

Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs

WHITE PAPER. Social media analytics in the insurance industry

[JOINT WHITE PAPER] Ontos Semantic Factory

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

IBM Customer Experience Suite and Predictive Analytics

Customer Experience Management

STAR WARS AND THE ART OF DATA SCIENCE

Multichannel analytics and discovery

THE STATE OF Social Media Analytics. How Leading Marketers Are Using Social Media Analytics

IBM Social Media Analytics

Overview, Goals, & Introductions

Government Management Committee. P:\2013\Internal Services\I&T\gm13005I&T (AFS # 17768)

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

SAS Text Analytics. SHRUG Feb.24, Fiona McNeill Copyright 2011, SAS Institute Inc. All rights reserved.

Text Technologies in the Mainstream

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

Beyond Watson: The Business Implications of Big Data

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Some Economics of Cultural PSI: the Micro Perspective

Unlocking the Power of Human Data

Enterprise Video Search ROI

Optimizing government and insurance claims management with IBM Case Manager

IBM G-Cloud - IBM Social Media Analytics Software as a Service

Data Refinery with Big Data Aspects

Big Data. Fast Forward. Putting data to productive use

Smart Financial Data: Semantic Web technology transforms Big Data into Smart Data

Provalis Research Text Analytics and the Victory Index

ORACLE SOCIAL ENGAGEMENT AND MONITORING CLOUD SERVICE

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

Search and Information Retrieval

a Marketwire Company Business Intelligence for Social Media request a demo - call

Solve Your Toughest Challenges with Data Mining

Discover How a 360-Degree View of the Customer Boosts Productivity and Profits. eguide

The Business Value of Predictive Analytics

Capture intelligence that matters

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

Text Mining and Analysis

The Truth About Sentiment & Natural Language Processing

Manjula Ambur NASA Langley Research Center April 2014

Turning Big Data into a Big Opportunity

Solve your toughest challenges with data mining

Using Data Analytics to Detect Fraud. Other Data Analysis Techniques

Loyalty. Social. Listening

Advanced Analytics. The Way Forward for Businesses. Dr. Sujatha R Upadhyaya

The role of multimedia in archiving community memories

HOW TO DO A SMART DATA PROJECT

Are You Ready for Big Data?

IT services for analyses of various data samples

Taking A Proactive Approach To Loyalty & Retention

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

Cleaned Data. Recommendations

Evaluating Multinational Communications Using Social Media Monitoring

Using a Multichannel Strategy to Deliver an Exceptional Customer Experience

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Deploying Big Data to the Cloud: Roadmap for Success

Sentiment Analysis on Big Data

KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Solve your toughest challenges with data mining

Achieving customer loyalty with customer analytics

Social Media Implementations

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Opportunities for Language Technologies in the Tourism Sector

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement

Reinventing Business Intelligence through Big Data

Do You Need to be a Data Scientist to Analyze Text? Fern Halper

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

NICE MULTI-CHANNEL INTERACTION ANALYTICS

Transcription:

Introduction to Text Mining and Semantics Seth Grimes -- President, Alta Plana

New York Times October 9, 1958

Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically. -- Marti A. Hearst, Untangling Text Data Mining, 1999

Document input and processing Information extraction Knowledge management Hans Peter Luhn, A Business Intelligence System, IBM Journal, October 1958

Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance. Hans Peter Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, April 1958

This rather unsophisticated argument on significance avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established. -- Hans Peter Luhn, 1958

Miranda: O wonder! How many goodly creatures are there here! How beauteous mankind is! O brave new world, That has such people in't! Prospero: 'Tis new to thee. The shipwreck in The Tempest, Act I, Scene 1, in an 1797 engraving based on a painting by George Romney.

New York Times, September 8, 1957 Anaphora / coreference: They

Kind = type, variety, not a sentiment. External reference Unfiltered duplicates

The broadcast, media, and entertainment industries garner about 4% of the world s revenues but already generate, manage, or otherwise oversee 50% of the digital universe. Approximately 70% of the digital universe is created by individuals. The Diverse and Exploding Digital Universe, (IDC, 2008)

Web sites, news & journal articles, images, video. Blogs, forum postings, and social media. E-mail, Contact-center notes and transcripts; recorded conversation. Surveys, feedback forms, warranty & insurance claims. Office documents, regulatory filings, reports, scientific papers. And every other sort of document imaginable. Is Search up to the job?

How are the quality, value & authority of search results? Hotel s opinion Guest s opinion about Priceline Who profits from search?

How can we do better? We have many of the tools in place -- from Web 2.0 technologies The Diverse and Exploding Digital Universe, (IDC, 2008)

Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as a platform. -- Tim O Reilly, 2004 [A] move from personal websites to blogs and blog site aggregation, from publishing to participation, an ongoing and interactive process... to links based on tagging. -- Terry Flew, New Media: An Introduction, 2008 Web 2.0 is dynamic, personalized, interactive, collaborative.

How can we do better? We have many of the tools in place -- from Web 2.0 technologies to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth. The Diverse and Exploding Digital Universe, (IDC, 2008)

Text mining enables smarter search that better responds to user goals, e.g., answers

For even better findability: The Semantic Web is a web of data, in some ways like a global database. -- Tim Berners-Lee, 1998 Web 3.0 is Web 2.0 + the Semantic Web + semantic tools. Recurring themes: Semantically enriched content. Linked Data. Context sensitive. Location aware.

Text mining / analytics enables Web 3.0 and the Semantic Web. Automated content categorization and classification. Text augmentation: metadata generation, content tagging. Information extraction to databases. Exploratory analysis and visualization. Technical concepts: Linked Data Microformats, RDF, SPARQL OWL

I recently published a study report, Text Analytics 2009: User Perspectives on Solutions and Providers. I estimated a $350 million global market in 2008, up 40% from 2007. I relayed findings from a survey that asked

What are your primary applications where text comes into play? Brand / product / reputation management Competitive intelligence Voice of the Customer / Customer Experience Research (not listed) Customer service Content management or publishing Life sciences or clinical medicine Insurance, risk management, or fraud Financial services E-discovery Product/service design, quality assurance, Other Compliance Law enforcement 8% 7% 22% 19% 18% 17% 15% 15% 14% 13% 40% 37% 33% 33% 0% 10% 20% 30% 40% 50%

What textual information are you analyzing or do you plan to analyze? Current users responded: blogs and other social media (twitter, social-network sites, etc.) 62% news articles 55% on-line forums 41% e-mail and correspondence 38% customer/market surveys 35%

Do you need (or expect to need) to extract or analyze: Other 15% Other entities phone numbers, e-mail & street addresses 40% Metadata such as document author, publication date, title, headers, etc. Events, relationships, and/or facts Concepts, that is, abstract groups of entities Sentiment, opinions, attitudes, emotions Topics and themes Named entities people, companies, geographic locations, brands, ticker 53% 55% 58% 60% 65% 71% 0% 10% 20% 30% 40% 50% 60% 70% 80%

Please rate your overall experience your satisfaction with text mining.