Best Practices in Sociophonetics: Field Recording (Digital Tools, Transcription) Christopher Cieri Linguistic Data Consortium

Similar documents

Linguistic Resources for OpenHaRT-13

Tools & Resources for Visualising Conversational-Speech Interaction

UB1 P R O F E S S I O N A L M I C R O P H O N E S

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Canalis. CANALIS Principles and Techniques of Speaker Placement

Conference Room Environmental Conditions :

Basics. Mbox 2. Version 7.0

Sweet Adelines Microphone and Sound System Guidelines

Direct and Reflected: Understanding the Truth with Y-S 3

Turbo X channel UHF true diversity

Using ELAN for transcription and annotation

Bose L1 Model II Single B2 w/t1

TOOLS for DEVELOPING Communication PLANS

WHITE PAPER. WEP Cloaking for Legacy Encryption Protection

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

ARTICLE. Sound in surveillance Adding audio to your IP video solution

Annotation in Language Documentation

VPAT Voluntary Product Accessibility Template

High Quality Podcast Recording

Mechanic Handheld Wireless Access Point Setup Guide

Copyright Exertis GO Connect Official Revolabs distributor for Netherlands, Belgium & Luxembourg

COUNTRYMAN E6 OMNIDIRECTIONAL EARSET MICROPHONE

LT-82 Stationary IR Transmitter

12. INDOOR INSTALLATION

CAT 885 Troubleshooting Guide

MOTIV DIGITAL MICROPHONES

Operation Manual for Users

User Guide FFFA

Carla Simões, Speech Analysis and Transcription Software

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

Study Plan for Master of Arts in Applied Linguistics

VIDEO CONFERENCING ROOM DESCRIPTION AND REQUIREMENTS INDIANA UNIVERSITY, BLOOMINGTON Indiana University DEFINITION

WIRELESS MICROPHONE SYSTEMS

E190Q Lecture 5 Autonomous Robot Navigation

Microphone Test System

Omni Antenna vs. Directional Antenna

Frequently Asked Questions (FAQs)

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Tutorial about the VQR (Voice Quality Restoration) technology

Lab Exercise 1: Acoustic Waves

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Dragon Solutions. Using A Digital Voice Recorder

MOTIV. Digital Microphones and Recording Solutions PLUG AND PLAY RECORDING, AT HOME ANYWHERE.

Industry Guidelines on Captioning Television Programs 1 Introduction

Technology White Paper Capacity Constrained Smart Grid Design

Parrot Zikmu by Starck. User guide

Preservation Handbook

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems

User Manual. Please read this manual carefully before using the Phoenix Octopus

Parrot MKi9000. User guide

Instruction manual EAR SET 1 EAR SET 4

PERSONAL MONITOR MIXER/HEADPHONE AMP. S Class Signal Processors

Speakers. Home Theater Speaker Guide

AFTER EFFECTS FOR FLASH FLASH FOR AFTER EFFECTS

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Hegel H1 High End Integrated Amplifier

Selecting the Right Conference Phone for Different Room Environments

Turkish Radiology Dictation System

A workshop for Teachers of Young Children. How Blocks Stack Up. By Sharon MacDonald

SE05: Getting Started with Cognex DataMan Bar Code Readers - Hands On Lab Werner Solution Expo April 8 & 9

User Guide for ELAN Linguistic Annotator

User s Manual. Copyright 2014 Trick Technologies Oy

Avaya WLAN 9100 External Antennas for use with the WAO-9122 Access Point

SL1100 Digital Call Logger User Guide

innkeeper PBX Desktop Digital Hybrid User Guide JK Audio

innkeeper PBX Desktop Digital Hybrid User Guide JK Audio Warranty

PS 29M DUAL CHANNEL BELTPACK IN METAL CASE

This Document Contains:

Synthetic Sensing: Proximity / Distance Sensors

V0910 SENNSIS. Making Your Tour. State of the Art. TourGuide 2020-D

miditech Audiolink II

At the completion of this guide you should be comfortable with the following:

Voluntary Product Accessibility Template (VPAT) Policy & Information

Data Deduplication in Slovak Corpora

Apogee ONE. QuickStart Guide. V3, March, 2013

LOW COST WIRELESS EMERGENCY ALERT SOLUTIONS

Understanding the DriveRack PA. The diagram below shows the DriveRack PA in the signal chain of a typical sound system.

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction

Universal Host. Desktop Digital Hybrid. User Guide. JK Audio

Acoustic design according to room type

Brief Introduction Thump Bluetooth Wireless Headphones features What s in the package? Bluetooth Wireless technology...

Using Video to Document Children s Learning

GLX-D DIGITAL WIRELESS SYSTEMS WIRELESS INTELLIGENCE TAKES THE STAGE.

Acceleration levels of dropped objects

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation

Doppler Effect Plug-in in Music Production and Engineering

Customizable Multi-Zone Wireless Audio

PROFESSIONAL MICROPHONES

A comparison of radio direction-finding technologies. Paul Denisowski, Applications Engineer Rohde & Schwarz

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development

Transcription:

Best Practices in Sociophonetics: Field Recording (Digital Tools, Transcription) Christopher Cieri Linguistic Data Consortium

Background LDC creates, shares corpora & other language resources 19 years, >78,000 copies, >1300 titles, >3100 orgs, 70 countries Consortium = mutual aid society members contribute membership fee (library), or sometimes data receive ongoing rights to data created in years member contributes grants in data for students; other arrangements possible for junior faculty, underfunded groups Large scale data collection billions of words of text thousands of hours of video tens of thousands of hours of audio >500 subjects * 2-4 sociolinguistic interviews + 12-24 calls tens of millions of annotation decisions coded >41,000 tokens for >150 dependent variables Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 2

news text web text: newsgroups, blogs, zines biomedical text & abstracts printed, handwritten & hybrid documents broadcast news broadcast conversation conversational telephone speech lectures meetings interviews read & prompted speech role play web video animal vocalizations LDC: Data Collection

LDC: Annotation data scouting, selection, triage audio-audio alignment, bandwidth, signal quality, language, dialect, program, speaker quick and careful transcription, aligned at the turn, sentence, word level orthographic & phonetic script normalization phonetic, dialect, sociolinguistic feature & supralexical documenting zoning tokenization and tagging of morphology, part-of-speech, gloss syntactic, semantic, discourse function, disfluency, sense disambiguation relevance identification, classification of mentions in text of entities, relations, events & coreference knowledgebase population time & location summarization of various lengths from 200 words down to titles translation, multiple translation, edit distance, translation post-editing, translation quality control alignment of translated text at document, sentence & word levels physics of gesture identification, classification of entities and events in video

History 1999 Gregory Guys workshop on publicly available corpora 2001 LDC DASL project, t/d deletion study 2002 William Labovs SLx Corpus and the DASLTrans 2003 Workshop at Penn of robust sociolinguistic methodology 2007 DiPaolo & Yaeger-Dror workshop with USSS, MIT-LL, Phanotics 2008 LREC paper on Phanotics project 2009 Update on methodology, Resulting paper 2010 2 nd DiPaolo & Yaeger-Dror workshop 2011 Chapter 3 in Sociophonetics: A Student s Guide 2011 NWAV workshop (?) on demographics for sociolinguistic fieldwork normalize core, familiarize readers with range of possible independent variables 2012 LSA workshop (?) demographics, situation attitude define terms in data category registry, publish modules program committee accepted, await decisions from NSF, LSA executive committee

Approach support many research communities promiscuous about standards knowledge & techniques for appropriate field recording maximize quality relative to situation no magic bullet, no singe configuration skipping question modules, interaction with subjects (DiPaolo & Yaeger-Dror 2011, Tagliamonte 2006) numerous collection methods other than field interviews existing speech corpora that already exist progress orientation minimize task avoidance Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 6

Phonation, Sound Propagation 1. source, modified by glottis and oral and nasal cavities 1. unfortunately other sounds as well 2. air molecules jostling each other 3. waves of pressure change expanding outward like a cone 4. bouncing off walls, ceiling floor, furniture, people 1. which may themselves be moving (see 2) 5. bouncing off eardrums and microphone diaphragms Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 7

Speaker Speaker Selection Criteria active & lifelong participation speech community under study place in a sample stratified by sex, age, socioeconomic class, ethnicity sometimes religious, political affiliation, membership in social group Typically these speakers do not also sit still, look straight ahead, avoid fiddling with papers, microphone cables Approach ameliorate situation accept subjects even if misbehaved oversample population, select best recordings Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 8

Uncertainty Principle the aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain this data by systematic observation (Labov 1972) techniques to reduce interviewer impact in tension with optimizing recording quality changes to conversational situation affect the resulting speech correcting subject speech head mounted mics recording booths sociolinguists in the field generally forgo very best recording conditions in favor of (nearly) vernacular speech Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 9

Environment: Noise We find potential subjects in poor locations for recording sub-optimal recording versus lost speakers modern world filled with noise we ignore; fieldworkers can become sensitive to noise, minimize Sources indoor noise: televisions, radios, music players but also refrigerator, faucet, lighting, particularly fluorescent/neon, HVAC, computers, clocks phones outdoor noise: traffic, outdoor play & intermittent noise near school, field, hospital, police station Location prepare select mitigate Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 10

Environment: Reflection Reflection sound reflects from surface at angle of incidence in amount related to size, shape, material properties of surface generally, large, flat, smooth, hard surfaces reflect more than small, irregularly shaped, textured, soft surfaces short walls, coverings, carpets, curtains better than long, empty walls, flat ceilings, bare floors, big windows, mirrors, long tables rooms that are squares or cubes are problematic for recording right angle corners increase reflection Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 11

Environment: Distance Distance inverse square law : energy of wave front decreases as a function of square of distance from source optimal distance from mouth to mic depends principally upon mic follow operations manual (online) closer = better except avoid proximity effect for directional microphones avoid placing microphone directly in airstream from mouth/nose avoid placing lavaliers in shadow of chin Interviewer at normal conversational distance Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 12

Equipment: Sensors One for each subject + one for the interviewer Types condenser (frequently electret) better frequency response, sensitivity, louder output small form factor, low cost dynamic Power rugged don t require their own power supply batteries inconvenient, risky plug-in power specifications vary across microphones, recorders battery packs add bulk Confirm compatibilities in advance of purchase Stock adequate supplies Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 13

Equipment: Sensors Polar pattern omnidirectional capture sound with same sensitivity from every direction: front, back, sides, above, and below robust to placement, movement directional robust to noise especially from behind susceptible to proximity effect boosting lower frequencies when <15 cm from source directional preferred when noise is principal concern, requires proper placement, well behaved subjects omnidirectional more flexible, better fidelity when noise is not primary concern Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 14

Polar Pattern Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 15

lavalier Equipment: Microphone: Mounting good quality recordings through near, unobtrusive placement attached via a clip to a jacket lapel worn around neck on a lanyard stabilized with pin or tape <= 20 cm from mouth (>= 15 cm if directional) not directly in airstream from the speaker mouth or nose not be placed in the shadow of the chin not be attached to the collar or placket of a shirt or blouse head mounted typically avoided because obtrusive however, current popularity headgear suggests stronger others stand-mounted, hand-held microphones obvious, risk making subjects self-conscious Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 16

Equipment: Microphone: Frequency Response different materials & manufacturing processes yields different sensitivities to different frequencies ideal might be a mic with flat frequency response across frequencies in which human speech is produced very few mics, even fewer low-cost microphones have very flat response across all speech frequencies having identified range of mic that meet other criteria compare frequency response to provide the best quality for intended use Cieri, Field Recording, Linguistics Institute, July 17, 2011 Boulder, CO 17

Frequency Response Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 18

Sampling Rate Equipment: Recorders Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 19

Sample Size Equipment: Recorders Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 20

Aliasing Equipment: Recorders Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 21

Sampling Rate 16kHz, if appropriate given source, e.g. less needed for telephone Sample Size 16 bits Compression Why risk it? Storage sampling rate * sample size/8 per second 96,000 * 24/8 * 60 * 60 = ~1GB/hour Analytic Software Requirements Equipment: Recorders Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 22

Desiderata adequate quality @ affordable price standard digital format, 16-bit samples, 16kHz sampling Equipment: Recorders uncompressed, nonproprietary allowing universal random access standard data interface for moving speech files to computer small, unobtrusive, very portable simple to use adequate storage and battery life for 1 entire day in the field monitors for battery life, remaining storage, level, clipping 2 channels with separate adjustments solid-state compatible with the microphones connector type (trs, xlr), power protocol (plug-in, phantom) Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 23

Equipment: Recorders H2 SP-CMC-2 PMD620 H4 DR-100 Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 24

Optimal Methods make coding efficient allowing researchers to consider greater percentage of tokens/variable investigate more variables minimize misses improve accuracy and balance improve consistency retains accurate time and sequence information retains mapping among sound, transcript, tokens, coding, analysis and examples in publication encourages re-use of data each additional pass requires less effort than original re-use & reanalysis profits from previous preparation

Model

Segmentation Divides corpus into manageable units indicates structural boundaries in recording provides time-alignment for transcripts and other annotations transcript becomes index to audio simplifies subsequent transcription, token selection, processing, analysis 8 seconds for transcription, FA runs better, Praat can display Preserve integrity of original signal virtual, not actual, chopping of digital signal allows multiple segmentations of the same event Speech Activity Detection (SAD) technology exists for some audio types (LDC has telephone, BUT has broadcast) segments by pause group need training material (segmented, representative sociolinguistic data) Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 27

Segmentation Segmentation for a specific purpose speaker turn, breath/pause group (1xRT), utterance, SU ( 5xRT) word level, phone level best handled as additional pass imparts additional level of analysis more difficult/costly, requires specialists free with forced alignment Issues levels of granularity multiple speakers on one channel overlapping speech even across channels how long is a pause? additional features: background, non-speaker noise, SID, style Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 28

Time as Variable Time is on the horizontal axis. Conversational situation (style) is on the vertical. Larger numbers mean greater formality. 4+ are elicited styles 3 is the default interview situation 2 is for narratives and extended descriptions 1 is for speech to another party The longer interview clearly provides greater opportunities to study style shifting!

Transcription Stoker 97 provides early justification for transcription in related field

Transcription Stoker 97 provides early justification for transcription in related field He accordingly set the phonograph at a slow pace, and I began to typewrite from the beginning of the seventeenth cylinder. He thinks that in the meantime I should see Renfield, as hitherto he has been a sort of index to the coming and going of the Count. I hardly see this yet, but when I get at the dates I suppose I shall. What a good thing that Mrs. Harker put my cylinders into type! We never could have found the dates otherwise. Stoker, Bram (1897) Dracula

Transcription Why transcribe? index to audio, intermediary to later coding searchable learn about session How to transcribe? verbatim no correction standard orthography, punctuation conventions for unintelligible speech non-standard variants speaker restarts, disfluencies, hesitations 7-10xRT using Transcriber, Xtrans Cieri, Strassel: Robust, Digital, Empirical, Reproducible Sociolinguistic Methodology, NWAV 39 November 4-6, 2010 San Antonio, Texas 32

Transcription Multiple passes focusing on different tasks limit cognitive load of any one pass tasks basic text disfluencies conversational situation dialect phenomena personal identifying information phonetics (inter-annotator agreement 70-90%)

Transcriber http://trans.sourceforge.net/en/presentation.php fastest segmentation More user friendly than strans Linux, Windows, OSX open-source multiple audio, text formats requires full segmentation of audio built for single-channel broadcast news handling of overlapping speech

XTrans http://www.ldc.upenn.edu/tools/xtrans/ fast segmenting, multi-channel, -speaker, overlaps, reads Transcriber, SPH Linux, Windows, OSX (in emulation)

Elan http://www.lat mpi.eu/tools/elan video, reads Transcriber, SPH, interacts with Praat, Linux, Windows, OSX segmentation a bit more complex than the others