Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics



Similar documents
NICE MULTI-CHANNEL INTERACTION ANALYTICS

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

The Scientific Data Mining Process

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

7 Best Practices for Speech Analytics. Autonomy White Paper

Session 61 L, Applications of Data Analytics in Health Insurance. Moderator/Presenter: Henning Chiv, FSA, MAAA

Social Media Implementations

Applying Data Science to Sales Pipelines for Fun and Profit

Machine Learning with MATLAB David Willingham Application Engineer

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Hurwitz ValuePoint: Predixion

Speech Analytics. Whitepaper

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Knowledge Discovery from patents using KMX Text Analytics

Advanced In-Database Analytics

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Customer Interaction Analytics Speech Analytics The Next Frontier

INTRODUCTION TO TRANSANA 2.2 FOR COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE (CAQDAS)

Random forest algorithm in big data environment

Using Data Mining for Mobile Communication Clustering and Characterization

CASE STUDY. Uniphore Software Systems Contact: Website: 1

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Sentiment Analysis on Big Data

Framing Business Problems as Data Mining Problems

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

MS1b Statistical Data Mining

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

C E D A T 8 5. Innovating services and technologies for speech content management

Microsoft Azure Machine learning Algorithms

2015 Workshops for Professors

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

The Predictive Data Mining Revolution in Scorecards:

An Introduction to Advanced Analytics and Data Mining

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

B2B opportunity predictiona Big Data and Advanced. Analytics Approach. Insert

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Driving Insurance World through Science Murli D. Buluswar Chief Science Officer

Instilling Confidence in Security and Risk Operations with Behavioral Analytics and Contextualization

MACHINE LEARNING BASICS WITH R

Innovative Analytics for Traditional, Social, and Text Data. Dr. Gerald Fahner, Senior Director Analytic Science, FICO

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

HOW IS C360 DIFFERENT THAN TRADITIONAL LEAD SCORING?

The LENA TM Language Environment Analysis System:

Why is Internal Audit so Hard?

Analyzing Big Data: The Path to Competitive Advantage

OPERA SOLUTIONS CAPABILITIES. ACH and Wire Fraud: advanced anomaly detection to find and stop costly attacks

ITS Training and Documentation Needs Assessment Project Report

Data-Driven Decisions: Role of Operations Research in Business Analytics

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements

How To Make A Credit Risk Model For A Bank Account

«The Five Myths of Predictive Analytics» 1

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive modelling around the world

Applications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories

Multichannel Customer Listening and Social Media Analytics

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

How To Analyze Claims Data

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Provalis Research Text Analytics and the Victory Index

At a recent industry conference, global

Big Analytics: A Next Generation Roadmap

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Introduction to Data Mining

Multichannel analytics and discovery

Pentaho Data Mining Last Modified on January 22, 2007

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

ICT Project on Text Transcription of Technical Video Lectures and Creation of Video Searchable Index, Metadata and Online Quizzes

Using Data Analytics to Detect Fraud. Other Data Analysis Techniques

Big Data Executive Survey

DATAOPT SOLUTIONS. What Is Big Data?

Azure Machine Learning, SQL Data Mining and R

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

VWF. Virtual Wafer Fab

EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN

Jiffy Lube Uses OdinText Software to Increase Revenue. Text Analytics, The One Methodology You Need to Grow!

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Big Data Text Mining and Visualization. Anton Heijs

Improving Traceability of Requirements Through Qualitative Data Analysis

Thirukkural - A Text-to-Speech Synthesis System

NICE PERFORM ANALYTICS SUITE

Table of Contents. Introduction... 3 Post-Call Analytics vs. Real-Time Monitoring How Real-Time Monitoring Works... 4

Transcription:

Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting,

EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos and speech. This data could contain valuable information that companies can utilize to make the right decisions. In this article, we focus on one such form of unstructured data which is speech. We present a use case, where we analyzed speech in clinical trials to automate a significant part of the operational processes, which has the potential to reduce the quality control costs by half. 2

What Is Speech Analytics? Speech has several aspects to it. Some of the elements of speech like words, speech rate, tone, emotions etc. are discernible by humans. There are other elements that humans don t identify so easily like minor variations in pitch and speech rates. Speech analytics is the characterization of speech based on these factors to derive actionable business insights from the data. 3

There are several ways in which speech can be analyzed, based on the type of application: Full transcription Full transcription involves conversion of speech into text format in applications like Siri or in transcribing meetings (for example, between a doctor and a patient), conferences, etc. Converting speech into text allows it to be searched more easily. Speaker diarization Speaker diarization involves separation of certain sections of speech based on the speaker. While transcribing speech with more than one speaker, like a meeting or a conference, it is important to not just convert speech to text but to identify who the speaker is. Keyword detection Keyword detection entails identification of certain specific keywords in an audio. Customer care centers can detect certain keywords like unhappy and disappointed and use them to monitor agent performance. Speaker authentication/identification (voice fingerprinting) Speaker authentication/identification (voice fingerprinting) involves identifying unique characteristics in every speaker s voice that allow us humans to differentiate between and identify speakers. Some fraud detection applications capture these unique features and create voice fingerprints during customer care interactions and compare against known blacklists. Emotion detection Emotion detection involves identification of the emotional state of the speaker. This can help identify irate customers during customer care interactions, among other applications. Other characteristics of conversation These are pauses, noise, etc. Characteristics like loud noises or long pauses could be indicators of a bad customer care conversation. Depending on the type of business problem, the analysis framework would have one or more of the above 4

Problems Faced During Clinical Trials Testing the efficacy of drugs for mental illnesses involves the doctor having detailed discussions with the patients to evaluate their mental state at various stages of the treatment. The clinical trials evaluate both the quality of the interviews and then whether or not the drug meets its targets. Interview quality evaluation typically involves experts listening to audio recordings of the interviews and scoring it on various quality metrics. This manual review is quite expensive. The objective here is to use speech analytics to assist the manual reviewers and significantly cut down the costs associated with review time. Pre-processing Role of Speech Analytics The first step was for us to remove any background noise so that the spoken dialog is clearly heard. We then split the files into sections of alternating speech and silence. Following this, we grouped the speech sections into clusters, each representing different speakers. Feature extraction We then extracted several hundred features from the audio files starting from direct features like duration and amplitudes to more abstract features like speech rates, frequency wise energy content and MFCCs. Among other things, these features also helped capture information that was characteristic of a person, similar to how a human would identify a person by their voice. Prediction The objective was to predict an interview quality score, a single number constructed by combining several qualitative aspects of the interview quality. We computed this score manually for a few audio files and then developed machine learning algorithms to identify inherent patterns and predict this score for all other audio files. We used various supervised machine learning techniques - logistic regression, boosted trees, random forests, support vector machines, etc. The best performing algorithm improved accuracy of identifying bad interviews by more than 50% compared to the random baseline, meaning the cost of identifying potentially bad interviews was halved. In other words, in the same amount of time, one could identify and review twice the number of bad interviews and gain rich insights which will eventually help the quality of clinical trials significantly. 5

Conclusion Speech analytics is an area with potential applications in almost all businesses that have any form of verbal interaction from call centers to classrooms. With the increase in computing power, and big data technologies, analyzing large volumes of unstructured speech data is becoming increasingly mainstream. When used appropriately, it can give a company significant reduction in cost as well as strong competitive advantage. Some functions like customer care have started incorporating speech analytics but there is still a long way to go before the full potential is realized. 6

About the Authors/ Patanjali V, the primary author, is a Lead Data Scientist at Tiger Analytics. He leads advanced analytics engagements that involve complex/unstructured data. Anand Bharadwaj, the co-author, is a Director at. He has 18+ years of experience in the consulting industry and loves to ensure business value realization of analytics solutions., (www.tigeranalytics.com) provides Big Data and advanced analytics solutions to help businesses make data driven business decisions. We bring deep expertise in data sciences along with understanding of business needs and state-of-the-art technologies to solve business problems. References 1. http://en.wikipedia.org/wiki/speech_analytics 2. https://blog.calltrackingmetrics.com/introducing-keyword-spotting-for-phone-calls/2014/08/ 3. http://en.wikipedia.org/wiki/speaker_diarisation 4. http://www1.icsi.berkeley.edu/~vinyals/files/taslp2011a.pdf 5. http://en.wikipedia.org/wiki/mel-frequency_cepstrum 6. http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs 7

Contact Us info@tigeranalytics.com www.tigeranalytics.com +1-408-508-4430 USA - East Coast 2321 Blue Ridge Rd., Suite 203 Raleigh, NC 27607 USA - West Coast 4701 Patrick Henry Drive, Building 16, Suite 14 Santa Clara, CA 95054 India - Chennai No. D-1, SIDCO Industrial Estate, Guindy, Chennai - 600032, India