Hidden Markov Model Toolkit (HTK) - Quick Start by David Philippou-Hübner



Similar documents
HTK Tutorial. Giampiero Salvi

Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Turkish Radiology Dictation System

Developing speech recognition software for command-and-control applications

Hardware Implementation of Probabilistic State Machine for Word Recognition

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Installation, Configuration, and Usage

How To Sync Quickbooks With Qvinci.Com On A Pc Or Macbook Or Mac Book (For A Webbook) With A Flashbook (For An Ubuntu Account) With An Ipo (For Macbook) On A Mac

SPELLING WORD #1: SENTENCE:

Automated Content Analysis of Discussion Transcripts

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

Publish Acrolinx Terminology Changes via RSS

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Cybozu Garoon 3 Server Distributed System Installation Guide Edition 3.1 Cybozu, Inc.

Editorial Manager(tm) for Journal of the Brazilian Computer Society Manuscript Draft

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

32-Bit Workload Automation 5 for Windows on 64-Bit Windows Systems

>

Event Log Summary Report

Business Process Change Analyzer How-to guide

Magento Search Extension TECHNICAL DOCUMENTATION

Comparing Open-Source Speech Recognition Toolkits

Mobile Labs Plugin for IBM Urban Code Deploy

CS 348: Introduction to Artificial Intelligence Lab 2: Spam Filtering

Programming on the Web(CSC309F) Tutorial: Servlets && Tomcat TA:Wael Aboelsaadat

Restoring Microsoft SQL Server 7 Master Databases

Web page creation using VoiceXML as Slot filling task. Ravi M H

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Bitrix Site Manager ASP.NET. Installation Guide

Applications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories

JobScheduler Installation by Copying

IBM WebSphere Application Server V8.5 lab Basic Liberty profile administration using the job manager

Amon Agent. User Guide

Changing over the EPLAN Dictionary to SQL Server EPLAN Platform Version 2.4 Status: 05/2014

INTERNATIONAL EDUCATION AND TRAINING PROGRAMS FOR BACHELOR, MASTER AND PHD STUDENTS OF MGTU«STANKIN» YEAR

Getting started Cassandra Access control list

REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,

BIRT Application and BIRT Report Deployment Functional Specification

Project management integrated into Outlook

Implementation of the Discrete Hidden Markov Model in Max / MSP Environment

django-cron Documentation

SciTools Understand Flavor for Structure 101g

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

AutoMerge for MS CRM 3

Generating Demand and Supply. Initial demand generation

Practice Fusion API Client Installation Guide for Windows

Parts Database to SQL Server EPLAN Platform Version 2.5 Status: 07/2015

18.2 user guide No Magic, Inc. 2015

Sitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension.

SSO Plugin. J System Solutions. Upgrading SSO Plugin 3x to 4x - BMC AR System & Mid Tier.

Bulk Downloader. Call Recording: Bulk Downloader

Copyright Texthelp Limited All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval

FWG Management System Manual

App Development with Talkamatic Dialogue Manager

BusinessObjects Enterprise XI Release 2

CS170 Lab 11 Abstract Data Types & Objects

Installing Ruby on Windows XP

Legacy Automated Testing Bridge QAComplete ALMComplete HP Quick Test Pro v

Compiere ERP & CRM Installation Instructions Linux System - EnterpriseDB

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.

Configuring and Troubleshooting Internet Information Services in Windows Server 2008

Matisse Installation Guide for MS Windows

Cassandra Installation over Ubuntu 1. Installing VMware player:

Administrator s Upgrade Guide.

Supplement I.B: Installing and Configuring JDK 1.6

Microsoft Windows Server 2008: Configuring and Troubleshooting Internet Information Services IIS

AWS Schema Conversion Tool. User Guide Version 1.0

HADOOP CLUSTER SETUP GUIDE:

Installing (1.8.7) 9/2/ Installing jgrasp

ONLINE EXERCISE SYSTEM A Web-Based Tool for Administration and Automatic Correction of Exercises

NetIQ. How to guides: AppManager v7.04 Initial Setup for a trial. Haf Saba Attachmate NetIQ. Prepared by. Haf Saba. Senior Technical Consultant

MSI Admin Tool User Guide

How to Launch the CTC from the Command Line for ONS 15327

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

MacroPhone. ISDN Call Monitor, Telephone Answering Machine and Fax-Functions over Networks

Version control. HEAD is the name of the latest revision in the repository. It can be used in subversion rather than the latest revision number.

TestTrack Test Case Management Quick Start Guide

Unicenter Desktop DNA r11

BRIC VPN Setup Instructions

Let s Learn Market Mechanism by Artificial Market Attached CD-ROM README file U-Mart project

Changing Over the EPLAN Project Management on SQL-Server EPLAN Platform Version 2.5 Status: 06/2015

Active Directory Authentication Integration

Avaya IP Telephone How To: Basic use and features

Configuration of a Load-Balanced and Fail-Over Merak Cluster using Windows Server 2003 Network Load Balancing

Spam Marshall SpamWall Step-by-Step Installation Guide for Exchange 5.5

Exchange Mailbox Protection Whitepaper

Perform this procedure when you need to add a recurring payment option, or when you need to change or withdraw it.

Transcription:

Hidden Markov Model Toolkit (HTK) - Quick Start by David Philippou-Hübner This 5 page step-by-step tutorial shows how to build a (minimal) automated speech recognizer using HTK. It is a compact version of the HTK Tutorial Example (Chapter 3) but mainly concentrates on the single tools provided in HTK as well as on the essential (configuration) files. The tutorial is especially addressed to undergraduates who prefer a compact presentation of relevant steps in order to build a speech recognizing system. Hence, this tutorial raises no claim to completeness, but hopefully guaranties a quick lift. For training merely one 7-word sentence (in German) is provided. The user should feel free to record and add further examples (using words from the provided lexica). Hints refer to the additional scripts in order to use the HTK tools in a more convenient way. Further, they help to automatize recurring processes like the iterative training with HERest for tance. Note that for the java programs a Java-Runtime-Environment (JRE) is required. 0. Provide(d) a) Training material, here: kino1.wav- kino5.wav ("Ich ") b) Pronunciation lexicon, here: german_lexicon.lex (for English: rm1_mono.lex) 1. Task Grammar a) Create file gram with following contents: $word = ; (SENT-START <$word> SENT-END) b) Create word network "wdnet" using HParse: > HParse gram wdnet 2. Creating the dictinonary a) Create a word list file "wlist" containing all words in the grammar in alphabetical order: b) Run HDMan: > HDMan -m -w wlist -n monophones1 -l dlog dict german_lexicon.lex Delivers the dictionary "dict", a list of occurring monophones "monophones1" and the log file "dlog" containing various statistics. Attention: the mapping between the words in "wlist" and pronunciation lexicon is case-sensitive, hence "" and "abend" differ! c) Edit "dict" by adding SENT-END [] sil SENT-START [] sil silence sil

at the correct positions (so that the list rema sorted) d) Check whether "monophones1" conta "sil" and "sp". If not, add sil sp at the end of the file. 3. Preparing the recorded data a) Create a word level MLF "words.mlf" containing the transcriptions of what is said in each.wav file and end each utterance with ".": #!MLF!# "*/kino1.lab". "*/kino2.lab". b) Create following edit script "mkphones0.led" containing: EX IS sil sil DE sp c) Generate the phone level MLF "phones0.mlf" via HLEd: > HLEd -l '*' -d dict -i phones0.mlf mkphones0.led words.mlf 4. Coding the data a) Create configuration file "config" with contents: #Coding parameters SOURCEKIND = WAVEFORMAT SOURCEFORMAT = WAV TARGETKIND = MFCC_0_D_A TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T

WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F b) Create "codetr.scp" script file to define for each.wav file the location and name of its corresponding.mfc file: kino1.wav train/kino1.mfc kino2.wav train/kino2.mfc c) Make sure the path to the.mfc files exists (otherwise:> mkdir train) and run HCopy in order to extract the features: > HCopy -T 1 -C config -S codetr.scp 5. Creating monophone HMMs a) Create the "train.scp" file containing all files to be used in training: train/kino1.mfc train/kino2.mfc b) Create a 3 state left to right HMM prototype called "proto", according to the HTK Tutorial page 30. c) Disable the entries SOURCEKIND and SOURCEFORMAT in "config" by putting # in front of them! d) Run HCompV: > mkdir hmm0 > HCompV -C config -f 0.01 -m -S train.scp -M hmm0 proto > sh HCompV.sh hmm0 config train.scp proto Delivers a new "proto" HMM in hmm0 and the file "vfloors". e) Create "monophones0" file with the same content as "monophones1" except the "sp" entry. f) Create "hmmdefs" file by copying the content of proto for each occurring monophone (in "monophones0"), such that each monophone has its own tance of a HMM in "hmmdefs". Modify the names of each HMM, e.g. ~h "ah" for the monophone "ah" HMM. Leave out all ~o entries as they will be set globally in the "macros" file. > java MakeHMMDefs proto monophones0 g) Create the "macros" file by opening "vfloors".

Add ~o <MFCC_0_D_A> <VecSize> 39 at the beginning and save it as "macros". > java GenerateMacros vfloors h) Run HERest to re-estimate the HMM parameters: > mkdir hmm1 > HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0 i) Repeat several iterations with HERest. Create a new hmm{step_number} directory for each run and be sure to adapt the directory-names in -H and -M options! Use the script below to easily repeat any number of iterations. > sh HERest.sh hmm0 {name_end_dir} {#iterations} config phones0.mlf train.scp monophones0 6. Fixing the silence models a) Create the "sil.hed" script containing: AT 2 4 0.2 {sil.transp} AT 4 2 0.2 {sil.transp} AT 1 3 0.3 {sp.transp} TI silst {sil.state[3],sp.state[2]} b) Open the latest "hmmdefs" file and copy it to a new hmm{latest+1} directory and add a one-state "sp" HMM. Make sure its topology is correct (<NumStates> 3, <TransP> 3, etc). The values of its only emitting state 2 are copied from the center state (state 3) of the "sil" HMM. > java Add_SP hmmdefs {name_of_destination_file} c) Run HHEd: > mkdir hmm{latest+1} > HHEd -H hmm{latest}/macros -H hmm{latest}/hmmdefs -M hmm{latest+1} sil.hed monophones1 d) Repeat several iterations with HERest as in 5i), but using "monophones1" as last parameter. 7. Realigning the data [optional] This is for re-estimation in order to include different pronunciation variants. a) Run HVite: > HVite -l '*' -o SWT -b silence -C config -a -H hmm{latest}/macros -H hmm{latest}/hmmdefs -i aligned.mlf -m -t 250.0 -y lab -I words.mlf -S train.scp dict monophones1 b) Repeat several iterations with HERest as in 6d).

8. Recognizer Evaluation a) Create the "test.scp" file containing all files to be used in the test (for a first trial this file may be identical with "train.scp"). train/kino1.mfc train/kino2.mfc b) Run HVite and HResults. The final recognition results are saved in recout.mlf: > HVite -H hmm{latest}/macros -H hmm{latest}/hmmdefs -S test.scp -l '*' -i recout.mlf -w wdnet -p 0.0 -s 5.0 dict monophones1 > HResults -I words.mlf monophones1 recout.mlf > sh HVite.sh hmm{latest} test.scp recout.mlf wdnet dict monophones1 words.mlf I hoped you enjoyed this tutorial and that it was a helpful starting point in order to proceed with your own research. David Philippou-Hübner Chair of Cognitive Systems; Institute for Information and Communication Technology; Faculty of Electrical Engineering and Information Technology Otto von Guericke University Magdeburg Universitätsplatz 2, D-39106 Magdeburg, Germany