Text Mining - Scope and Applications



Similar documents
Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

Text Mining: The state of the art and the challenges

TEXT ANALYTICS INTEGRATION

Sentiment Analysis on Big Data

Hexaware E-book on Predictive Analytics

How To Make Sense Of Data With Altilia

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

Big Data and Analytics: Challenges and Opportunities

Social Media Implementations

WHITEPAPER. Text Analytics Beginner s Guide

Introduction. A. Bellaachia Page: 1

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Voice. listen, understand and respond. enherent. wish, choice, or opinion. openly or formally expressed. May Merriam Webster.

Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics

Analyzing survey text: a brief overview

Provalis Research Text Analytics and the Victory Index

Multichannel analytics and discovery

Deriving Business Intelligence from Unstructured Data

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Text Analytics. A business guide

The Big Data Paradigm Shift. Insight Through Automation

SPATIAL DATA CLASSIFICATION AND DATA MINING

COMP9321 Web Application Engineering

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Top Data Management Terms to Know Fifteen essential definitions you need to know

Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Find the signal in the noise

Chapter ML:XI. XI. Cluster Analysis

Information Visualization WS 2013/14 11 Visual Analytics

Sunnie Chung. Cleveland State University

itunes 1.6 Cognitive - The Building Blocks for Smarter Apps

The Future of Business Analytics is Now! 2013 IBM Corporation

Build Vs. Buy For Text Mining

Why Enterprises Need a Social Media

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

How To Understand The Value Of Big Data

How To Create A Data Science System

Best practices for evaluating and selecting content analytics tools

STAR WARS AND THE ART OF DATA SCIENCE

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Data Warehousing and Data Mining in Business Applications

An Ontology Based Text Analytics on Social Media

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

IBM SPSS Text Analytics for Surveys

IBM Social Media Analytics

Using Data Analytics to Detect Fraud. Other Data Analysis Techniques

BI and the Unstructured Data Challenge

Big Data and utility function in bank services. Nikolay K. Vitanov 1

Database Marketing, Business Intelligence and Knowledge Discovery

IBM SPSS Modeler Text Analytics 16 User's Guide

Real World Application and Usage of IBM Advanced Analytics Technology

Social Media Analytics Summit April 17-18, 2012 Hotel Kabuki, San Francisco WELCOME TO THE SOCIAL MEDIA ANALYTICS SUMMIT #SMAS12

Data Mining System, Functionalities and Applications: A Radical Review

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Megaputer Intelligence

Search and Information Retrieval

Text Analytics Software Choosing the Right Fit

Business Intelligence and Decision Support Systems

text data analytics insights unstructured predictive improve source Extracting Value from Unstructured Data use behavior characteristics customer

Big Data in Telecom value chain. Presented by: Gurjot S Sandhu Director Sales Xalted Information Systems Pvt. Ltd.

Miracle Integrating Knowledge Management and Business Intelligence

Text Analysis for Big Data. Magnus Sahlgren

Navigating Big Data business analytics

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Future Directions in Text Analytics

how to gain insight from text

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

I D C A N A L Y S T C O N N E C T I O N. I m p r o vi n g C o m m u n i c a t i o n s w i t h Au t o m a t e d T r a n s l a t i o n

LEVERAGING BIG DATA!

Big Data. Fast Forward. Putting data to productive use

SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE

Transforming the Telecoms Business using Big Data and Analytics

Data Analytics in Organisations and Business

Multichannel Customer Listening and Social Media Analytics

Why are Organizations Interested?

Direct-to-Company Feedback Implementations

Chapter 6 - Enhancing Business Intelligence Using Information Systems

CHAPTER 2 Social Media as an Emerging E-Marketing Tool

Text Analytics with Ambiverse. Text to Knowledge.

Big Data Executive Survey

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Forward Thinking for Tomorrow s Projects Requirements for Business Analytics

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

relevant to the management dilemma or management question.

Telecom Italia s Reputation Monitoring Room

Big Data: Rethinking Text Visualization

The Value of Visualization for Understanding Data and Making Decisions

The Text Analytics Market(s)

Visual Analytics: Combining Automated Discovery with Interactive Visualizations

Manjula Ambur NASA Langley Research Center April 2014

Knowledge Discovery from patents using KMX Text Analytics

IBM Social Media Analytics

IBM SPSS Modeler Premium

Transcription:

Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss Latika Kaushik Software Developer at ACSG, Delhi Abstract Text mining which is also known as data mining from textual unstructured databases refers to the process of extracting interesting and non-trivial patterns or knowledge from text i.e. may it be in any format. Unstructured text is very common, and in fact may represent the majority of information available to a particular research or data mining project. Has text mining evolved so rapidly to become a mature field? This article attempts to shed some lights to the scope and applications of Text Mining. In conclusion, we highlight the scope, upcoming challenges of text mining and the opportunities it offers. What is text mining? Text mining is an art of text analytics which is one way to make qualitative or unstructured data usable by a computer. Qualitative data is descriptive data that cannot be measured in numbers and often includes qualities of appearance like color, texture, and textual description. Quantitative data is numerical, structured data that can be measured. However, there is often confusion between qualitative and quantitative categories. For example, a photograph might traditionally be considered qualitative data but when you break it down to the level of pixels, which can be measured. Text mining can help an organization derive potentially valuable business insights from text-based content such as word documents, email and postings on social media streams like Facebook, Twitter and professional media like LinkedIn. Mining unstructured data with natural language processing (NLP), statistical modeling and machine learning techniques can be challenging, however, because natural language text is often inconsistent. It contains ambiguities caused by inconsistent syntax and semantics, including slang, language specific to vertical industries and age groups, double entendres and sarcasm. Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-

52 Miss Latika Kaushik quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Why do Text Mining? Broadly speaking there are (so far) four main reasons for text mining: 1. to enrich the content in some way 2. to enable systematic review of literature 3. for discovery 4. for computational linguistics research. Text Mining and Data Mining Just as data mining can be loosely described as looking for patterns in data, text mining is about looking for patterns in text. However, the superficial similarity between the two conceals real differences. Data mining can be more fully characterized as the extraction of implicit, previously unknown, and potentially useful information from data.the information is implicit in the input data: it is hidden, unknown, and could hardly be extracted without recourse to automatic techniques of data mining. With text mining, however, the information to be extracted is clearly and explicitly stated in the text. It s not hidden at all most authors go to great pains to make sure that they express themselves clearly and unambiguously and, from a human point of view, the only sense in which it is previously unknown is that human resource restrictions make it infeasible for people to read the text themselves. The problem, of course, is that the information is not couched in a manner that is amenable to automatic processing. Text mining strives to bring it out of the text in a form that is suitable for consumption by computers directly, with no need for a human intermediary. Text Mining Tools Computer program makes any task easier.text mining computer programs are available from many commercial and open source companies and sources. Commercial AeroText a suite of text mining applications for content analysis. Content used can be in multiple languages. Angoss Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded Lexalytics Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis.

Text Mining - Scope and Applications 53 Attensity hosted, integrated and stand-alone text mining (analytics) software that uses natural language processing technology to address collective intelligence in social media and forums; the voice of the customer in surveys and emails; customer relationship management; e-services; research and e- discovery; risk and compliance; and intelligence analysis. Autonomy text mining, clustering and categorization software Lots many open source tools are also available. Practical Applications of Text Mining Some examples of practical applications of text mining techniques include: Spam filtering Creating suggestion and recommendations (like amazon) Monitoring public opinions (for example in blogs or review sites) Customer service, email support Automatic labeling of documents in business libraries Measuring customer preferences by analyzing qualitative interviews Fraud detection by investigating notification of claims Fighting cyberbullying or cybercrime in IM and IRC chat and so on Let me discuss few of them- Security applications Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes. It is also involved in the study of text encryption/decryption. Software applications Text mining methods and software is also being researched and developed by major firms, including IBM and Microsoft, to further automate the mining and analysis processes, and by different firms working in the area of search and indexing in general as a way to improve their results. Within public sector much effort has been concentrated on creating software for tracking and monitoring terrorist activities. Online media applications Text mining is being used by large media companies, such as the Tribune Company, to clarify information and to provide readers with greater search experiences, which in

54 Miss Latika Kaushik turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content. Marketing applications Text mining is starting to be used in marketing as well, more specifically in analytical customer relationship management. A survey and marketing is done by using many of Text Mining Tools. Sentiment analysis Sentiment analysis may involve analysis of movie reviews for estimating how favorable a review is for a movie.such an analysis may need a labeled data set or labeling of the affectivity of words. Text has been used to detect emotions in the related area of affective computing.text based approaches to affective computing have been used on multiple corpora such as students evaluations, children stories and news stories. Academic applications The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Potential Weakness: Finally, as a word of caution, text mining doesn t generate new facts and is not an end, in and of itself. The process is most useful when the data it generates can be further analyzed by a domain expert, who can bring additional knowledge for a more complete picture. Still, text mining creates new relationships and hypotheses for experts to explore further. Challenges of Text Mining There exists lots of challenges for data mining.let me discuss few of them- 1) Very high number of possible dimensions All possible word and phrase types in the language! 2) Unlike data mining: records (= docs) are not structurally identical records are not statistically independent 3) Complex and subtle relationships between concepts in text POL merges with Time-Changer

Text Mining - Scope and Applications 55 Time-Changer is bought by POL 4) Ambiguity and context sensitivity automobile = car = vehicle = Toyota 5) Data collection is free text Data is not well-organized 6) Semi-structured or unstructured Natural language text contains ambiguities on many levels 7) Lexical, syntactic, semantic, and pragmatic Learning techniques for processing text typically need annotated training examples CONCLUSION We have provided a very brief introduction to text mining, its scope and applications within the pages of this article. There are numerous challenges to the statistical community that reside within this discipline area. As in any data mining or exploratory data analysis effort, visualization of textual data is an essential part of the problem. The statistical community has a great deal to contribute to many of these problems. In this article we have presented the scope and challenges in the area of text mining.. References [1] Wikipedia, http://en.wikipedia.org/wiki/text_mining [2] http://eprints.ncrm.ac.uk/227/1/what_is_text_mining.pdf [3] Text Mining: Applications and Theory, Wiley InterScience [4] http://www.statsoft.com/textbook/text-mining/ [5] http://searchbusinessanalytics.techtarget.com/definition/text-mining [6] http://www.cs.waikato.ac.nz/~ihw/papers/04-ihw-textmining.pdf [7] http://www.publishingresearch.net/documents/prctextminingandscholarlypu blishinfeb2013.pdf [8] http://text-analysis.sourceforge.net/practical-applications [9] R. Feldman and I. Dagan, Knowledge discovery in textual databases (kdt), in Proceedings of the Conference 1995, [10] Google Search Engine on Knowledge Discovery and Data Mining,

56 Miss Latika Kaushik