Estonian Large Vocabulary Speech Recognition System for Radiology



Similar documents
Turkish Radiology Dictation System

Slovak Automatic Transcription and Dictation System for the Judicial Domain

Automatic slide assignation for language model adaptation

Word Completion and Prediction in Hebrew

Slovak Automatic Dictation System for Judicial Domain

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

Building A Vocabulary Self-Learning Speech Recognition System

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

An Arabic Text-To-Speech System Based on Artificial Neural Networks

TED-LIUM: an Automatic Speech Recognition dedicated corpus

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events

CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content

Presentation Objective: Benefits to New Employee. Goal for Participants. Benefits to the Administration

Reading Assistant: Technology for Guided Oral Reading

ASSIGNMENT 2: THE HEALTHCARE RECORD AND HEALTHCARE DOCUMENTATION TECHNOLOGY

C E D A T 8 5. Innovating services and technologies for speech content management

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

First, read the Editing Software Overview that follows so that you have a better understanding of the process.

Strategies for Training Large Scale Neural Network Language Models

Creation of a Speech to Text System for Kiswahili

Golder Cat-Scan & MRI Center Automates Imaging Center Workflows

Evaluation of speech technologies

Medical Transcription

LSTM for Punctuation Restoration in Speech Transcripts

Generating Training Data for Medical Dictations

Dragon Solutions Using A Digital Voice Recorder

Dragon Solutions Transcription Workflow

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Great Basin College is offering online training for Medical Transcriptionist

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Delivering Accelerated Results

Course Outline/Description Court Reporting

Speech Analytics. Whitepaper

U.S. Bureau of Labor Statistics. Radiology Tech

Industry Guidelines on Captioning Television Programs 1 Introduction

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

NEW YORK STATE UNIFIED COURT SYSTEM OFFICE OF COURT ADMINISTRATION. EXAMINATION ORIENTATION GUIDE: 2015 Senior Court Reporter Examination

Court Reporting AAS Degree Program Overview

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

Speech Recognition: What's Coming and Impact

Dragon Solutions. Using A Digital Voice Recorder

The Royal College of Radiologists

Acoustical Surfaces, Inc.

Training in Clinical Radiology

Editorial Manager(tm) for Journal of the Brazilian Computer Society Manuscript Draft

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Transcription FAQ. Can Dragon be used to transcribe meetings or interviews?

Carla Simões, Speech Analysis and Transcription Software

Optimizing Clinical Productivity:

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Master of Arts in Linguistics Syllabus

Health Management Information Systems: Medical Imaging Systems. Slide 1 Welcome to Health Management Information Systems, Medical Imaging Systems.

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Medical Transcriptionists

C115 Office Administration - Medical MTCU Code Program Learning Outcomes

Speech Transcription

Scalable Web and Mobile Solution for Healthcare Software Provider

Automated Speech to Text Transcription Evaluation

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

Transcription:

Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 1 / 10

Radiology Radiology Radiology is the branch or specialty of medicine that deals with the study and application of imaging technology (such as X-ray, ultrasound, computer tomography) and radiation to diagnosing and treating disease. Radiologist views and interprets a radiology image and creates a report that describes the findings. Achilles tendon has a uniform structure, tears are not detected. Left, there is a small tissue swelling of the tendon side. Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 2 / 10

Motivation Radiologists eyes and hands are busy during the preparation of a radiological report In many hospitals, radiologists dictate the reports which are then converted to text by human speech transcribers Speech recognition systems have the potential to replace human transcribers and enable faster and less expensive report delivery In radiology, a typical active vocabulary is much smaller than in general communication and the sentences usually have a well-defined structure Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 3 / 10

Acoustic models We used various wideband Estonian speech databases for training acoustic models: Models: BABEL speech database (phonetically balanced dictated speech from 60 different speakers, 9h) transcriptions of Estonian broadcast news (mostly read news speech from around 10 different speakers, 7.5h) transcriptions of live talk programs from three Estonian radio stations (42 different speakers, 10h) MFCC features 25 phonemes, silence, nine fillers triphone models 2000 tied states, 8 Gaussian per state CMU Sphinx Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 4 / 10

Language model For training a language model: 1.6 million distinct reports, with 44 million word tokens Normalization: expanded and/or normalized different abbreviations using hand-made rules used a morphological analyzer for determining the part-of-speech (POS) properties of all words expanded numbers the resulting corpus was used for producing two corpora: one including verbalized punctuations and another without punctuations. vocabulary was composed of the most frequent 50 000 words Two trigram LMs one with verbalized punctuations and another without punctuations were built. The two LMs were interpolated into one final model. Perplexity 35, OOV 2.6% Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 5 / 10

Evaluation Recorded a small test corpus of radiology reports Dictated by 10 radiologists, 26 minutes per speaker on average Actual reports from our test set Recordings were manually trascribed Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 6 / 10

Results Used CMU Sphinx 3.7, running in 0.5 x real-time one-pass speaker independant system Speaker WER AL 7.3 AR 8.5 AS 8.5 ER 10.3 JH 13.3 JK 9.2 SU 10.7 VE 8.7 VS 11.9 Average 9.8 Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 7 / 10

Error analysis Errors: 17% of errors are word compounding errors a compound word is recognized as a non-compound, or vice versa (i.e., the only error is in the space between compound constituents) 17% due to spelling errors 11% normalization mistmatches (e.g., C kuus C six vs. C6) Thus, only around 55% of the errors were real recognition errors Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 8 / 10

Future work We have now more wideband speech data (55 hours in total) Adaptation: Acoustic model: adapt to the voice of a speaker Language model: adapt to the typical report content of a speaker (e.g., one radiologists might be specialized on MRI images) Perform Wizard of Oz style experiments where radiologists produce reports spontaneously for previously unseen images Post-processing: consistent normalization of read numbers, dates, abbreviations, and proper structuring of the generated reports Integrate the system into the radiology information system (RIS) Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 9 / 10

Thanks! Alumäe, Meister (TUT, Estonia) Estonian LVCSR System for Radiology October 8, 2010 10 / 10