Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Similar documents
WinPitch LTL II, a Multimodal Pronunciation Software

Speech Signal Processing: An Overview

Praat Tutorial. Pauline Welby and Kiwako Ito The Ohio State University. January 13, 2002

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

InqScribe. From Inquirium, LLC, Chicago. Reviewed by Murray Garde, Australian National University

Using ELAN for transcription and annotation

Thirukkural - A Text-to-Speech Synthesis System

EDM SOFTWARE ENGINEERING DATA MANAGEMENT SOFTWARE

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

Audacity Sound Editing Software

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne

The use of Praat in corpus research

Robust Methods for Automatic Transcription and Alignment of Speech Signals

User Guide for ELAN Linguistic Annotator

Transana 2.60 Distinguishing features and functions

Lecture 1-10: Spectrograms

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Establishing the Uniqueness of the Human Voice for Security Applications

Detailed user guide for Audacity

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

EUROPEAN COMPUTER DRIVING LICENCE. Multimedia Audio Editing. Syllabus

Recording and Editing Audio with Audacity

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

Annotation in Language Documentation

Fast Labeling and Transcription with the Speechalyzer Toolkit

DiVAS video archiving and analysis software

ADDING DOCUMENTS TO A PROJECT. Create a a new internal document for the transcript: DOCUMENTS / NEW / NEW TEXT DOCUMENT.

Audio Coding Algorithm for One-Segment Broadcasting

interviewscribe User s Guide

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project

WINDAQ Data Acquisition and Playback Software

Develop Software that Speaks and Listens

Tutorial. Part One -----Class1, 02/05/2015

Things to remember when transcribing speech

Digitizing Sound Files

Search and Information Retrieval

TECHNICAL OPERATING SPECIFICATIONS

CLIO 8 CLIO 8 CLIO 8 CLIO 8

B3. Short Time Fourier Transform (STFT)

UNIVERSITY OF CALICUT

Ovation Operator Workstation for Microsoft Windows Operating System Data Sheet

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

DeNoiser Plug-In. for USER S MANUAL

School Class Monitoring System Based on Audio Signal Processing

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Measuring and synthesising expressivity: Some tools to analyse and simulate phonostyle

1.1.1 Event-based analysis

LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH

Formant Bandwidth and Resilience of Speech to Noise

Audio Editing. Using Audacity Matthew P. Fritz, DMA Associate Professor of Music Elizabethtown College

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Recording Supervisor Manual Presence Software

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

Recent advances in Digital Music Processing and Indexing

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

Annotation Pro Software Speech signal visualisation, part 1

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Copyright Kinoma Inc. All rights reserved.

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Transcribing with Annotation Graphs

Reviewed by Ok s a n a Afitska, University of Bristol

A user friendly toolbox for exploratory data analysis of underwater sound

Media Object Production - Hardware and Software Tools

Machinery condition monitoring software

The Connacht Education and Training Alliance. Programme Module for. Sound Engineering and Production. leading to. Level 5 FETAC

Raritan Valley Community College Academic Course Outline MUSC DIGITAL MUSIC COMPOSITION I

CMAS. Compact Radio Signal Monitoring Solution

RightMark Audio Analyzer 6.0. User s Guide

SASSC: A Standard Arabic Single Speaker Corpus

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.

The Minor Third Communicates Sadness in Speech, Mirroring Its Use in Music

Call Recorder Oygo Manual. Version

Music technology. Draft GCE A level and AS subject content

SR2000 FREQUENCY MONITOR

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

RF Measurements Using a Modular Digitizer

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Preservation Handbook

COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

M3039 MPEG 97/ January 1998

Removing Primary Documents From A Project. Data Transcription. Adding And Associating Multimedia Files And Transcripts

Transcription Format

A new dimension of sound and vibration analysis

Dictation Software Feature Comparison

LongoMatch:The Digital Coach. User Guide

Transcription:

Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1

Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis and Transcription Software 2

Methods for Speech Acoustic Analysis Analog-to-Digital Conversion Firstly the sound wave has to be digitized sampling and quantization Oscillogram analysis Noise, intensity, duration and rhythm analysis Spectral analysis FFT, Fast Fourier Transform noise and formant structure analysis LPC, Linear Predictive Coding formant structure analysis Spectrogram analysis Noise, intensity, duration, format structure and rhythm analysis Melody analysis Intensity analysis 3

Why Speech Acoustic Analysis? Phonetic description Linguistic variability Geographical Social Style Speech technologies development Text-To-Speech systems Speech Recognition systems Dialog systems Speech pathology analysis and rehabilitation Phonetic conflict analysis for non-native speakers Speaker identification 4

Annotation The annotation of a speech corpus denotes all symbolic information that is directly related to the speech signal, either via the physical time scale, in which case we speak of a segmentation and labeling, or via some semantic content of the speech signal, in which case we speak of a transcription or tagging (Florian Schiel, Christoph Draxler, The Production of Speech Corpora, 2004) 5

Segmentation vs Alignment Segmentation It determines the sound or graphic time limits of the units chosen (sentence, word, syllable) Can be automated with reasonable success for good quality recording. A method which detects the more or less abrupt changes in the spectrogram along the time axis (Cosi, 1997) Alignment It links sound units with corresponding text units Automatic alignment Markov Models the limits of speech sound are obtained from the phonetic transcription. (Talin and Wightman 1994, Fohr, Mari, et Haton 1996) By synthesis Comparison between the time variations of the speech signal spectra with another speech signal, generated by a text to speech synthesizer operating on the text to align. (Malfrère and Dutoit, 2000) Problems with automatic alignment and segmentation Performs depends on speaker s voice characteristics Require good quality recordings to reduce the error rate Overlapping of speakers voices 6

Speech Analysis and Transcription Software ELAN, Max Plank Institute for Psycholinguistics PCquirer, Scicon R&D, Inc. PitchWorks, Scicon R&D. Inc. Praat, Institute of Phonetic Sciences, University of Amsterdam Prosogram, P. Mertens, Department of Linguistics, KU Leuven SFS, Speech Filing System, Department of Phonetics and Linguistics, University College London Speech Analyzer, CCS Software Development Speech Studio, Laryngograph Ltd. Transana, Wisconsin University Transcriber, C. Barras, LIMSI, CNRS - E. Geoffrois, DGA, CTA, GIP WaveSurfer, Centre for Speech Technology, KTH Winpitch, Pitch Instruments Inc. 7

ELAN, EUDICO Linguistic Annotator Max Plank Institute for Psycholinguistics http://www.mpi.nl/tools/elan.html It is an annotation tool that allows you to create, edit, visualize an search annotations for video and audio data ( ) for purposes of annotation, analysis and documentation. display a speech and/or video signals, together with their annotations time linking of annotations to media streams linking of annotations to other annotations unlimited number of annotation tiers as defined by the users different character sets export as tab-delimited text files Search options 8

PCquirer Scicon R&D, Inc. http://www.sciconrd.com/pcquirerx. html Operates XAudioBox and XAudioButtonBox for high quality stereo recording PlotFormants function directly from spectrogram or from external file Automatic data logging with a click of the mouse Real time spectrogram and FFT when recording Tape recorder style controls for recording Highly controlled playback output level control for perception experiments Stereo operation with sample rate of 11, 22 and 44 khz Sample rate, gain, filter rate controlled by software Reads many audio data types Real time spectrogram and FFT of audio Free hand labeling capability 9

PitchWorks Scicon R&D http://www.sciconrd.com/pitchwork s.html TOBI style labeling PitchWorks is the main tool for any intonation studies. 10 levels of tiers TOBI style labeling Capable of reading many different file types. FFT, LPC, Intensity, Spectrogram, Formant tracking,... Cepstral and Autocorrelation, pitch extraction methods Synchronized cursor between windows Automatic data logging Direct printing from every window Save each window as a bitmap (PC) View of the label window - The labels can be sorted by tiers, labels, or time. A label can be selected and the file can be zoomed to that label. No need to look into a long file to look for any labels 10

Praat P. Boersma & D. Weenink, Institute of Phonetic Sciences University of Amsterdam http://www.praat.org Developed by Paul Boersma and David Weenink, in Dutch means Talk General purpose speech tool : Speech analysis Speech synthesis Segmentation and labeling Speech manipulation Learning algorithms Statistics Listening experiments Online help, FAQ, manual Additional tutorials, scripts, resources, user groups 11

Prosogram P. Mertens, Department of Linguistics, KU Leuven http://bach.arts.kuleuven.be/pmertens/prosogram/ Transcription of prosody using pitch contour stylization based on a tonal perception model and automatic segmentation ( ) F0, intensity, voicing (V/UV) Obtain a segmentation Stylize the F0 of the selected time intervals Determine pitch range used in speech fragment. Plot stylized pitch and some annotation tiers (text, phonetic transcription). Use a musical (semitone) scale and add calibration lines at every 2 ST for easy interpretation of pitch intervals. The system is implemented as a Praat script 12

SFS Speech Filing System, University College London http://www.phon.ucl.ac.uk/resource/sfs/ Tutorials : http://www.phon.ucl.ac.uk/resource/sfs/howto/ ( ) It performs standard operations such as acquisition, replay, display and labeling, spectrographic and formant analysis and fundamental frequency estimation. It comes with a large body of ready made tools for signal processing, synthesis and recognition, as well as support for your own software development. Acquisition and replay Waveform processing, Laryngographic processing Fundamental frequency estimation formant frequency estimation Formant synthesis Spectrographic analysis Filterbank analysis/synthesis Resampling Speed/pitch changing Annotation Spectral cross-sections waveform envelope Filtering Signal editing Signal alignment 2 January 2008 13

Speech Analyzer CCS Software Development http://www.sil.org/computing/speecht ools/speechanalyzer.htm Use this software for recording, transcribing, and analyzing speech files. Transcribe speech files phonetically with IPA. Playback at a slower speed. Playback with repetition with variable length delay between repetitions. Add phonemic, orthographic, tone and gloss annotations to your transcription in an interlinear format. View sound file as a waveform, pitch plot, spectrogram, spectrum and various F1 vs. F2 displays. Music Analysis capability 14

SpeechStudio Laryngograph Ltd. http://www.laryngograph.com/pr_studio.htm Speech Studio is a software and hardware package, which has been specially designed for phoneticians, speech scientists and quantitative work by ENT clinicians and SLT s. It supports data recording direct to hard disk, real-time displays, and instantaneous quantitative analysis and pattern target mode for speech training. 15

Transana Wisconsin University http://www.transana.org/ Transana is designed to facilitate the transcription and qualitative analysis of video and audio data. It provides a way to view video or play audio recordings, create a transcript, and link places in the transcript to frames in the video ( ) It also features database and file manipulation tools that facilitate the organization and storage of large collections of digitized video." 16

Transcriber C. Barras, LIMSI, CNRS - E. Geoffrois, DGA, CTA, GIP http://www.ldc.upenn.edu/mirror/transcriber/ Transcriber is a tool for assisting the manual annotation of speech signals user-friendly graphical user interface speech recordings segmentation Transcription Labeling It is more specifically designed for: annotation of broadcast news recordings creating corpora used in the development of automatic broadcast news transcription systems 17

WaveSurfer Centre for Speech Technology, KTH http://www.speech.kth.se/wavesurfer/ WaveSurfer is an Open Source tool for sound visualization and manipulation Flexible interface - handles multiple sounds Common sound file formats - reads, and writes WAV, AU, AIFF, MP3, CSL, SD, Ogg/Vorbis, and NIST/Sphere Transcription file formats - reads, and writes HTK (and MLF), TIMIT, ESPS/Waves+ and Phondat. Support for encodings and Unicode Unlimited file size - playback and recording directly from/to disk Sound analysis - e.g. spectrogram and pitch analysis Customizable - users can create their own configurations. Localization support Extensible - new functionality can be added through a plugin architecture Embeddable - WaveSurfer can be used as a widget in custom applications Scriptable - hosts a built-in script interpreter 18

Winpitch Pitch Instruments Inc. http://www.winpitch.com Multimedia Real time monitoring of recordings (spectrogram, Fo, etc.) High precision segmentation, speech turns overlapping Direct transcription capability Assisted alignment of existing transcription Automatic building of speech segments database (XML output) Multimedia input formats (wav, mp2. aiff, au, mpeg, mp2, mp4, avi, gsm 6.1, etc.) Direct speech analysis (spectrogram, Fo, intensity, etc.) from speech segments database Prosodic morphing 19