FOUNDATION The LENA TM Lnguge Environment Anlysis System: Audio Specifictions of the DLP-0121 Michel Ford, Chrles T. Ber, Dongxin Xu, Umit Ypnel, Shrmi Gry LENA Foundtion, Boulder, CO LTR-03-2 September 2008 Softwre Version: V3.1.0 Copyright 2009, LENA Foundtion, All Rights Reserved
Abstrct The LENA lnguge environment nlysis system ws designed to estimte dult nd key child interctions in nturl home environments. Contrry to controlled clinicl reserch environments, the speech used by the prticipnts in this study ws rel, unrehersed, nd representtive of ech child s typicl dily lnguge environment. In this pper, we describe the Audio Processing System in terms of informtion flow, feture extrction, nd segmenttion identifiction. We lso revel the udio specifictions tht were either met or exceeded during the development nd design of the LENA digitl lnguge processor (DLP). Keywords Audio specifictions, feture extrction, segmenttion, trnscription, Digitl Lnguge Processor Copyright 2009, LENA Foundtion, All Rights Reserved 2
1.0 Introduction The LENA lnguge environment nlysis softwre V3.1.0 ws developed to process nd selectively filter udio nd interference signls resulting from nturl dt collection environment. The primry gols of the udio dt processing re to estimte Adult Word Counts (AWC), Child Vocliztions (CV), nd Converstionl Turns (CT) between the dult nd key child. Here, we describe the Audio Processing System in terms of informtion flow, feture extrction, nd segmenttion identifiction nd detil the udio specifictions tht were either met or exceeded during the development nd design of the LENA digitl lnguge processor (DLP). 2.0 LENA Processing Flow-Chrt The LENA Audio Processing System comprises four distinct components: informtion flow, informtion processing, lgorithmic processing models, nd professionl humn trnscriptions (Figure 1). Child Voice & Environment Sound LENA DLp Feture Extrction Child Speech process Reports & Disply Trnscripts Sttisticl Models Segmenttion nd Segment id its File Converstion Anlysis & Turn Estimtion Adult Speech process Figure 1. LENA Lnguge Environmentl Anlysis Audio Processing System. Copyright 2009, LENA Foundtion, All Rights Reserved 3
Initilly, n udio file contining recording dt from child s nturl home lnguge environment is stored in the DLP. The dt re first processed in the DLP to minimize disk spce nd bttery power consumption. The udio dt on the DLP re trnsferred through USB port onto computer where the dt re further processed nd coustic fetures re extrcted. Vrious coustic fetures re extrcted for different purposes. Some fetures re primrily used for distinguishing speech signl from non-speech signl; others re used for child speech processing to distinguish child vocliztion from other child sounds such s cries, vegettive sounds nd fixed signls. At the hert of the LENA system is the cpbility for the lgorithmic models to segment nd ppropritely identify sounds of vrying mplitude nd intensity. Fetures extrcted from the udio dt were segmented through itertive modelling processes into eight ctegories tht identify the source of the udio signl: the key child (wering the LENA DLP); other child; dult mle nd dult femle; overlpping sounds (t lest one humn); noise; electronic (e.g. television/rdio) sounds; nd silence. Bsed on the sttisticl fit of ech segment to the selected model, the seven ctegories other thn silence re further dichotomized into cler (i.e., high likelihood) nd uncler or quiet/distnt (i.e., low likelihood) sub-ctegories. Professionl udio trnscriptions were used to trin the udio processing models, nd the lgorithms utilized the models to identify vriety of segments from the udio signls ccurtely nd relibly. For exmple, it ws necessry for the speech processing lgorithms to differentite dult speech from child speech, nd to differentite the speech of the key child from the speech of other children or non-speech sounds (e.g. cries or vegettive sounds). Thus, lgorithmic models were built nd optimized using the professionlly trnscribed segmenttions s bsis for ccurcy. The ccurcy nd relibility of the LENA softwre V3.1.0 is described in LENA Foundtion Technicl Report LTR-05-2 nd the trnscription process in LTR-06-2. After individul segments re identified, further processing genertes key LENA dt. Key child sound segments re nlyzed through itertive processing to distinguish segments contining key child speech (including words, bbbles, nd pre-speech communictive sounds such s squels, growls, or rspberries) from non-speech (including fixed signls nd vegettive sounds) nd to estimte the number nd durtion of vocliztions produced by the child. Adult sound segments re processed to estimte the number of dult words child hers. Non-speech Copyright 2009, LENA Foundtion, All Rights Reserved 4
sound such s coughing, vegettive sounds, etc., re filtered out nd sttisticl models re used to estimte the number of words spoken in ech dult segment. Refer to LENA Foundtion Technicl Reports LTR-04-2 nd LTR-05-2 for informtion on the segmenttion process nd speech/non-speech clssifictions. Sttisticl modeling is further used to detect Converstionl Turns (CT), or bck nd forth lterntion between the key child nd n dult. For this purpose converstion ws defined s contiguous region contining live humn speech seprted from the next converstion by puse region of t lest five seconds durtion which contins only non-live-humn speech udio signls. CTs cnnot cross converstion boundries. Results from the udio processing described bove re written to the Interpreted Time Segments or ITS file, n XML-coded plin text compiltion of every fcet of dt recorded nd nlyzed by the LENA softwre.plese see Technicl Report LTR-04-2 for further informtion on the ITS file. LENA softwre engineers continue to improve the lgorithmic-bsed feture extrction nd segmenttion nlyses. We intend to relese upgrde versions of the softwre nnully. 3.0 Len System Audio Specifiction The LENA System includes Digitl Lnguge Processor (DLP) tht ws developed by hrdwre nd softwre engineers t the LENA Foundtion. Here, we describe the performnce gols ssocited with the DLP, s well s hrdwre nd opertionl performnce. 3.1 Performnce Gols The LENA DLP is used for full-dy recording sessions, for mximum of 16 consecutive hours. Thus, the unit must be stble nd mintin high levels of inter-recorder relibility, nd the performnce gols center on these two spects of the design. LENA Foundtion hrdwre engineers observed tht signl level directly ffected AWC. For exmple, if the signl vrition ws +/- 1 db, mximum of 4% vrince ws observed. However, if the signl vrition ws +/- 2 db, the mximum vrince observed ws 18%. Copyright 2009, LENA Foundtion, All Rights Reserved 5
In the exmple below, showing the signl vrition of the current model DLP-0121, six DLP units were chosen t rndom to determine how well they recorded between two different psses. As reveled in Figure 2, the signl vrition between psses ws quite mrginl for ll DLP units tested. -30.50 Recorded signl dbfs -30.75-31.00-31.25-31.50 First Pss Second Pss 1 2 3 4 5 6 Digitl Lnguge Processor Figure 2. Signl relibility using six DLP units chosen t rndom. LENA Foundtion hrdwre engineers sought to produce consistent inter-recorder sensitivity to minimize vrition of report output from different DLP units. The trget sensitivity ws set to minimize vrition (67 dbc SPL to -30 dbfs in the udio file). An dditionl performnce gol ws to chieve inter-recorder vrition of no more thn +/- 1 db. Currently, inter-recorder (between unit) vrition is less thn +/- 0.5 db nd intr-recorder (within single unit) vrition is less thn +/- 0.1 db. Additionl performnce gols included flt frequency response (+/- 1dB 100-4000 Hz), on/ off xis linerity of sensitivity nd frequency rnge, nd low signl distortion. Finlly, the unit ws designed such tht the recording ws unffected s the bttery dischrged to lower opertionl limit. The current DLP model DLP-0121 meets or exceeds US/Cnd complince stndrds. Stndrds for complince tht were either met or exceeded re shown in Tble 1. Copyright 2009, LENA Foundtion, All Rights Reserved 6
Tble 1: Complince stndrds met or exceeded by the LENA DLP. Stndrds for Complince Description DLP-0121 UL 60065 CAN/CSA-C22.2 No. 60065 UL Stndrds for udio, video, nd similr electronic pprtus Sfety requirements Cnd Stndrd for udio, video, nd similr electronic pprtus Sfety requirements. UL 696 UL Stndrd for Sfety Electric Toys EN 55022 EU Stndrd for Informtion Technology Rdio disturbnce chrcteristics UL: Underwriters Lbortories Inc; EU: Europen Union 3.2 Hrdwre Audio dt were collected using n omnidirectionl microphone with flt 20 Hz-20 khz frequency response. Extreme frequencies were suppressed, s they were unlikely to contin humn speech ctivity. Low frequency dt were suppressed through 70 Hz high-pss filter. Digitl dt were recorded using 10 khz low-pss filter to suppress high-frequency sounds. Frequencies were recorded using 16 khz 16-bit sigm-delt nlog to digitl (ADC) converter with 8x over-smpling digitl interpoltion. Initilly, udio dt were written to 512 MB flsh memory using 4:1 Adptive Differentil Pulse Code Modultion compression scheme (DVI-4 ADPCM). The flsh memory uses n internl error correcting code (ECC) for dt storge nd recovery. Complete dischrge of the bttery will not result in loss of udio dt. Dt were uploded to host computer through USB 2.0 high-speed port with sustined udio trnsfer rte to host of pproximtely 4 MB/sec (~ 2.5 minutes per 16 hours of udio). Once uploded, the dt were decompressed to the PCM udio formt with one 16-bit chnnel t 16kHz smple rte. The DLP-0121 unit pek operting power is 50 mw. A primry 450 mah bttery provides minimum of 30 hours of recording when new. The recording is sfely discontinued when bttery power is depleted. The DLP contins rel-time clock (RTC) for time-stmping recordings, s well s providing time bse for built-in ADC smple rte clibrtion. The unit comes equipped with dedicted rel-time clock bttery power for life of pproximtely 5 yers. Copyright 2009, LENA Foundtion, All Rights Reserved 7
3.3 Simple Opertion The LENA DLP ws designed for usbility. It is equipped with power switch nd test button (Figure 3). A visul feedbck mechnism llows the user to esily identify when the unit is sleeping or recording, s well s the bttery sttus. The unit esily ttches to LENA-designed clothing in protective pocket tht snps shut. The DLP is compct (3-3/8 x 2-3/16 x 1/2 ) nd of miniml weight (< 2 oz) in reltion to children, thus minimizing the distrction ssocited with the presence of the recorder. Figure 3. The LENA digitl lnguge processor (DLP-0121), ctul size. 4.0 Conclusion We hve described the four components of the udio processing system: informtion flow, processing, sttisticl modeling, nd trnscriptions used for model trining. Fetures extrcted from the udio re segmented through itertive modeling processes into ctegoricl components including mle nd femle dult, key child, other child, overlpping speech, noise, electronic noise, nd silence. Key child segments re further segmented into speech/non-speech, nd dult segments re processed to generte AWC estimtes. Adult child lterntions re processed into CT estimtes. The LENA DLP is simple to operte nd the inter-unit signl vrition is low, s ssessed by test-retest relibility. The DLP model DLP-0121 hs either met or exceeded US/Cnd complince stndrds. Copyright 2009, LENA Foundtion, All Rights Reserved 8