Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications



Similar documents
IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

Customer Relationship Management

Computing. Federal Cloud. Service Providers. The Definitive Guide for Cloud. Matthew Metheny ELSEVIER. Syngress is NEWYORK OXFORD PARIS SAN DIEGO

Risk Analysis and the Security Survey

Private Equity and Venture Capital in Europe

Master Data Management

Configuration. Management for. Senior Managers. Essential Product Configuration. and Lifecycle Management

Big Data Analytics From Strategie Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

Managing Data in Motion

Financial Statement Analysis

Measuring Data Quality for Ongoing Improvement

Data Warehousing in the Age of Big Data

Relationship marketing

Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University

Securing the Cloud. Cloud Computer Security Techniques and Tactics. Vic (J.R.) Winkler. Technical Editor Bill Meine ELSEVIER

Winning the Hardware-Software Game

Metrics and Methods for Security Risk Management

Cyber Attacks. Protecting National Infrastructure Student Edition. Edward G. Amoroso

Job Hazard Analysis. A Guide for Voluntary Compliance and Beyond. From Hazard to Risk: Transforming the JHA from a Tool to a Process

Fixed/Mobile Convergence and Beyond AMSTERDAM BOSTON. HEIDELBERG LONDON

Open Source Toolkit. Penetration Tester's. Jeremy Faircloth. Third Edition. Fryer, Neil. Technical Editor SYNGRESS. Syngrcss is an imprint of Elsevier

Network Security. Windows 2012 Server. Securing Your Windows. Infrastructure. Network Systems and. Derrick Rountree. Richard Hicks, Technical Editor

Supply Chain Strategies

INTERNATIONAL MONEY AND FINANCE

Molecular Biology Techniques: A Classroom Laboratory Manual THIRD EDITION

Working Memory and Education

Eye Tracking in User Experience Design

Customer and Business Analytic

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

Measuring and. Communicating. Security's Value. A Compendium of Metrics. for Enterprise Protection

IT Manager's Handbook

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

The Seven Practice Areas of Text Analytics

Predictive Analytics Certificate Program

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Electricity for the Entertainment Electrician Ef Technician

Benchmarking of different classes of models used for credit scoring

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

How To Write A Diagram

Valvation. Theories and Concepts. Rajesh Kumar. Professor of Finance, Institute of Management Technology, Dubai, UAE

Agile Development & Business Goals. The Six Week Solution. Joseph Gee. George Stragand. Tom Wheeler

Data Warehouse Design

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Delivery. Enterprise Software. Bringing Agility and Efficiency. Global Software Supply Chain. AAddison-Wesley. Alan W. Brown.

CONTEMPORARY DIRECT & INTERACTIVE MARKETING

Engineering DOCUMENTATION CONTROL HANDBOOK

Obj ect-oriented Construction Handbook

Securing SQL Server. Protecting Your Database from. Second Edition. Attackers. Denny Cherry. Michael Cross. Technical Editor ELSEVIER

Introduction to Data Mining

TEXT ANALYTICS INTEGRATION

The Process. Improvement. Handbook. A Blueprint for Managing Change and. Increasing Organizational Performance. Tristan Boutros.

CIMA'S Official Learning System

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Scenario-Based Development of Human-Computer Interaction. MARY BETH ROSSON Virginia Polytechnic Institute and State University

The Predictive Data Mining Revolution in Scorecards:

Contents. Dedication List of Figures List of Tables. Acknowledgments

Development Effort & Duration

The Designer's Guide to VHDL

Practical Web Analytics for User Experience

Customer Relationship. Management. Ed Peelen and Rob Beltman

Data Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Management. Oracle Fusion Middleware. 11 g Architecture and. Oracle Press ORACLE. Stephen Lee Gangadhar Konduri. Mc Grauu Hill.

Software Security. Building Security In. Gary McGraw. A Addison-Wesley

Network Security: A Practical Approach. Jan L. Harrington

Rapid System Prototyping with FPGAs

for the Entire Organization

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Human Performance Improvement

Virtualization and Forensics

Master Data Management and Data Governance Second Edition

Developer's Handbook

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Security Metrics. A Beginner's Guide. Caroline Wong. Mc Graw Hill. Singapore Sydney Toronto. Lisbon London Madrid Mexico City Milan New Delhi San Juan

Data Mining Solutions for the Business Environment

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

The Data Access Handbook

Chapter 3: Cluster Analysis

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

HOW TO IMPROVE THE RETURN ON YOUR SOCIAL MARKETING INVESTMENT. Guy Powell Steven Groves Jerry Dimos. WILEY John Wiley & Sons (Asia) Pte. Ltd.

superseries FIFTH EDITION

An In-Depth Look at In-Memory Predictive Analytics for Developers

BUSINESS INTELLIGENCE

Data Mining: Overview. What is Data Mining?

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

Oracle Big Data Handbook

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Maximizing Return and Minimizing Cost with the Decision Management Systems

Transcription:

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications Gary Miner Dursun Delen John Elder Charlottesville, VA, USA Andrew Fast Charlottesville, VA, USA Thomas Hill Robert A. Nisbet Santa Barbara, CA, USA Major Guest Authors: Jennifer Thompson Woodward, OK, USA Richard Foley Raleigh, NC, USA Angela Waner Linda Winters-Miner Karthik Balakrislman San Francisco, CA, USA AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

ENDORSEMENTS FOR PRACTICAL TEXT MINING & STATISTICAL ANALYSIS FOR NON-STRUCTURED TEXT DATA APPLICATIONS FOREWORD 1 FOREWORD 2 FOREWORD 3 ACKNOWLEDGMENTS PREFACE ABOUT THE AUTHORS INTRODUCTION LIST OF TUTORIALS BY GUEST AUTHORS xi xv xvii xix xxi xxiii xxv xxxi xxxvii Part I Basic Text Mining Principles 1. The History of Text Mining 3 2. The Seven Practice Areas of Text Analytics 29 3. Conceptual Foundations of Text Mining and Preprocessing Steps 43 4. Applications and Use Cases for Text Mining 53 5. Text Mining Methodology 73 6. Three Common Text Mining Software Tools 91 Part II Introduction to the Tutorial and Case Study Section of This Book AA. CASE STUDY: Using the Social Share of Voice to Predict Events That Are about to Happen 127 BB. Mining Twitter for Airline Consumer Sentiment 133 vii

A. Using STATISTICA Text Miner to Monitor and Predict Success of Marketing Campaigns Based on Social Media Data 151 B. Text Mining Improves Model Performance in Predicting Airplane Flight Accident Outcome 181 С Insurance Industry: Text Analytics Adds "Lift" to Predictive Models with STATISTICA Text and Data Miner 203 D. Analysis of Survey Data for Establishing the "Best Medical Survey Instrument" Using Text Mining 233 E. Analysis of Survey Data for Establishing "Best Medical Survey Instrument" Using Text Mining: Central Asian (Russian Language) Study Tutorial 2: Potential for Constructing Instruments That Have Increased Validity 251 F. Using ebay Text for Predicting ATLAS Instrumental Learning 273 G. Text Mining for Patterns in Children's Sleep Disorders Using STATISTICA Text Miner 357 H. Extracting Knowledge from Published Literature Using RapidMiner 375 I. Text Mining Speech Samples: Can the Speech of Individuals Diagnosed with Schizophrenia Differentiate Them from Unaffected Controls? 395 J. Text Mining Using STM, CART, and TreeNet from Salford Systems: Analysis of 16,000 ipod Auctions on ebay 413 K. Predicting Micro Lending Loan Defaults Using SAS Text Miner 417 L. Opera Lyrics: Text Analytics Compared by the Composer and the Century of Composition Wagner versus Puccini 457 M. CASE STUDY: Sentiment-Based Text Analytics to Better Predict Customer Satisfaction and Net Promoter Score Using IBM SPSS Modeler 509 N. CASE STUDY: Detecting Deception in Text with Freely Available Text and Data Mining Tools 533 O. Predicting Box Office Success of Motion Pictures with Text Mining 543 P. A Hands-On Tutorial of Text Mining in PASW: Clustering and Sentiment Analysis Using Tweets from Twitter 557 Q. A Hands-On Tutorial on Text Mining in SAS : Analysis of Customer Comments for Clustering and Predictive Modeling 585

R. Scoring Retention and Success of Incoming College Freshmen Using Text Analytics 605 S. Searching for Relationships in Product Recall Data from the Consumer Product Safety Commission with STATISTICA Text Miner 645 T. Potential Problems That Can Arise in Text Mining: Example Using NALL Aviation Data 657 U. Exploring the Unabomber Manifesto Using Text Miner 681 V. Text Mining PubMed: Extracting Publications on Genes and Genetic Markers Associated with Migraine Headaches from PubMed Abstracts 703 W. CASE STUDY: The Problem with the Use of Medical Abbreviations by Physicians and Health Care Providers 751 X. Classifying Documents with Respect to "Earnings" and Then Making a Predictive Model for the Target Variable Using Decision Trees, MARSplines, Naive Bayes Classifier, and K-Nearest Neighbors with STATISTICA Text Miner 773 Y. CASE STUDY: Predicting Exposure of Social Messages: The Bin Laden Live Tweeter 797 Z. The InFLUence Model: Web Crawling, Text Mining, and Predictive Analysis with 2010-2011 Influenza Guidelines CDC, IDSA, WHO, and FMC 803 Part III Advanced Topics 7. Text Classification and Categorization 881 8. Prediction in Text Mining: The Data Mining Algorithms of Predictive Analytics 893 9. Entity Extraction 921 10. Feature Selection and Dimensionality Reduction 929 11. Singular Value Decomposition in Text Mining 935 12. Web Analytics and Web Mining 949 13. Clustering Words and Documents 959 14. Leveraging Text Mining in Property and Casualty Insurance 967 15. Focused Web Crawling 983

16. The Future of Text and Web Analytics 991 17. Summary 1007 GLOSSARY 1017 INDEX 1025 HOW TO USE THE DATA SETS AND THE TEXT MINING SOFTWARE ON THE DVD OR ON LINKS FOR PRACTICAL TEXT MINING 1047