VECSYS LIMSI ARCHITECTURE Samir Bennacef Vecsys
Centralized Architecture Acoustic models Language models Caseframe Grammar Task Model Database Speech Recognizer text Semantic Analyzer Semantic frame Dialog Manager queries results Information Retrieval Telephone Interface Semantic Frame Speech Synthesis sentence Sentence Generator Unit Dictionary Generation Grammar Fig 1. Spoken Dialogue System Architecture
Telephone Interface phone program Input: commands and speech Output: events and speech Recording and playback, DTMF detection and generation, pickup, hangup and call transfert Hardware echo cancellation Barge-in based on adaptative speech detection NMS QX2000 hardware Computer Telephony Access API
Speech Recognizer Cepstral Features Computation: sig2mfcc Input: speech recorded by the phone program Output: 13 component cepstral vector every 10 ms on a 8kHz bandwidth. Speech recognizer: nsearch Inputs: commands and cepstral coefficients Output: recognized text
Semantic Analyzer Lexical normalization and labelling: sentprocess Input: recognized sentence Output: labelled sentence Caseframe analysis: cases Input: labelled sentence Output: semantic frame
Dialog Manager Dialog Input: semantic frame resulting from cases Output: semantic frame to be converted in natural language Contextual understanding Database query generation Semantic frame generation Use a powerful scripting language
Natural Language Generator Genere Input: semantic frame resulting from dialog Output: natural language sentence Use of hierarchic rules
Information Retrieval Interface Dbserver Input: SQL query Output: database result Query parsing and translating Retrieves informations from the target database Provides the result table
Speech Synthesis System Syn Input: sentence resulting from genere Output: speech signal which is played by the telephone interface Use of unit dictionary Select the best sequence of units using a dynamic programming algorithm
C-shell script # ------------------------ Phone interface ----------------------- # rsh $remote $bin/phone.exe h$dialhost $dialport t70 x8192 n2 \ l2 g f$cfg/cta.cfg a$data& # -------------------- Speech recognizer loading ----------------- # # ---- SigToCep ---- # set CEP = \ $bin/sig2mfcc -w240 -s80 -l20 -n12 -r8000 -b0:3500 -c -en0-0 \ --$fifo/tosentrec.fifo$i $fifo/tosig2mfcc.fifo$i -: # ---- Speech Recognizer ----# set RECO = \ $bin/nsearch -@$phones -d$fifo/tosentrec.fifo$i -t \ -p${plist}:$stbl -s0:160:0:f -l$voc -z3 -w4:25 -n1 -q63,12:8:3 \ -zb$tg -zw30 -zr -xg$gsl -zy$clst -sw50 -sh25000 \ -cmr${cepmean}:0.996 -en4.5 -- $hmm -xf $bin/recocheck -r $fifo/torecord.fifo$i -c$cep d$reco \ -t$fifo/fromdial.fifo$i -v < \ $fifo/pushtotalk.fifo$i > $fifo/tocases.fifo$i &
#-------- Semantic Analyzer and Dialogue loading ---------# $bin/sentprocess -k -t -d -c -v2 $dial/rules.txt < \ $fifo/tocases.fifo$i \ $bin/cases -k -o -m -v $dial/caseframe.txt \ $bin/dialogue -i -v1 $dial/task.txt $dial/dial.arg \ -tr$fifo/pushtotalk.fifo$i -fp$fifo/fromplay.fifo$i \ -fn$fifo/todial.fifo$i -rf$tmp/reco.tmp$i \ -e$fifo/fromdial.fifo$i -fg$fifo/fromgenere.fifo$i \ -tt$fifo/todb.fifo$i -ft$fifo/fromdb.fifo$i \ $bin/genere $dial/genere.txt -f$fifo/fromgenere.fifo$i v > $fifo/tosyn.fifo$i&
# ----------------------- Dispatcher ----------------------------- # $bin/dispatcher -v -p$synt/sig/prompt.sig -f$synt/sig/dtmf.sig \ -l"$logcmd" -s$fifo/torecord.fifo$i -db$fifo/fromplay.fifo$i \ -dt$fifo/todial.fifo$i -df$fifo/fromdial.fifo$i -dp$dialpid \ -kf$fifo/fromdbconn.fifo$i -kt$fifo/todb.fifo$i \ -kw$synt/sig/waitdb.sig -kl$synt/sig/wait.sig -v r \ < $fifo/fromphone.fifo$i > $fifo/tophone.fifo$i & # ------------------- Database Loading --------------------------- # $bin/dbserver -t$fifo/todbtarg.fifo$i -f$fifo/fromdbtarg.fifo$i \ -c$db/table1.txt -s$db/table2.txt -p$db/table3.txt \ -d$fifo/fromdbconn.fifo${i}:120 -m10 -a -v2 \ < $fifo/todb.fifo$i > $fifo/fromdb.fifo$i & # -------------------- Synthesis loading ------------------------- # $bin/syn -s${sig}:2 -l$wd -w4:2:0 -o$fifo/toplay.fifo$i -c \ $synt/wdlist.lst & $bin/play -i$fifo/toplay.fifo$i -o$fifo/tophone.fifo$i -p v &
How the system works server.csh: telephone interface loading server.csh: speech recognizer loading server.csh: dialog loading server.csh: dispatcher loading server.csh: dbserver loading server.csh: synthesis loading telephone: pickup telephone: line number=[0] telephone: play telephone: get dtmf [*] dialogue: frame: { concept: (acte formalite-ouverture). } genere: Quel voyage souhaitez-vous effectuer? telephone: play telephone: end of play telephone: recording nsearch: <s> Paris Lille pour demain matin </s>
sentprocess: Paris -> $place Lille -> $place matin -> *matin $place(paris) $place(lille) *to(pour) demain(demain) *matin(matin) cases: <defaut> { place: Paris. place: Lille. departure-period: *matin. departure-date: demain. } dialogue: request=[select from, deph, to, arrh, chg, day, stopa, stopah, stopd, stopdh, stopdur, type WHERE from=paris AND to=lille AND day=17/5/101 AND arrh ~= 1000] dbserver: target query=[00043 00000001? 12 FRPAR FRLIL 17 MAY 1000] dbserver: result=[1 ( from deph to arrh chg day stopa stopah stopd stopdh stopdur type )( Paris-Gare-du-Nord 0858 Lille-Flandres 0959 0 17/5/101 ----- 0959 ----- ----- ----- TGN )]
Des Hommes dialogue: de Parole { concept: (acte response) (type positive) (value train-hour). nb-trains: (value 1). concept2: (acte confirmation) (value hour). from-place: (value Paris-Gare-du-Nord). to-place: (value Lille-Flandres). departure-wday: (value jeudi). departure-day: (value 17/5/101). departure-period: (value *matin). stop: (value 0). sched: (dep 0858) (arr0959). } genere:le matin, jeudi dix-sept mai vous avez un train de Paris- Nord `a Lille-Flandres `a huit heures cinquante-huit arrivant `a neuf heures cinquante-neuf. Cet horaire vous convient-il? nsearch: <s> oui </s> cases: <defaut> { mode: *affirmatif.} dialogue: { concept: (acte relance) (value retour). } genere: Souhautez-vousle retour? nsearch: recognized string: <s> non merci </s> genere:vous avez donc un aller Paris-Nord Lille-Flandres le jeudi dix-sept mai d'epart huit heures cinquante-huit, arriv'ee neuf heures cinquante-neuf. Souhaitez-vous un autre trajet? nsearch: recognized string: <s> non merci </s> genere:au-revoir, le syst`eme Recital vous remercie et vous souhaite un bon voyage. October telephone:hangup 2001
Distributed Architecture Recognizer Recognizer Dialogue Dialogue Speech Speech synthesis synthesis Net Net Audio Audio server server Other Services Recognizer Recognizer Dialogue Dialogue Speech Speech synthesis synthesis Other Services Vnetd Daemon Host1 (Master) Vnetd Daemon Host n (Slave) Network (TCP/UDP) Network (TCP/UDP) Application Programming Interface Application Programming Interface (Data exchange Protocols) (Data exchange Protocols) (Service Name-Address Resolution) (Service Name-Address Resolution) Client Client Application Application 1 1 Client Client Application Application 2 2 Client Client Application Application m m
Services 1. Audio 2. Speech recognition 3. Dialog (understandig, dialog and generation) 4. Information retrieval 5. Speech synthesis 6. Application manager
Galaxy Communicator Similarities between GC and Oasis A distributed client/server architecture A central manager : hub in GC and the application manager in Oasis A set of services listening for client connections and requests
Des Hommes de Parole Make Oasis Services GC Compliant Include the GC server functions in all services: make initialization include a dispatch function invoke the hub by using GalIO_Comm family functions Use the brokering mechanism
Tests and Evaluation The speech recognizer only The dialog connected to the database The dialog with the recognizer The whole system Supported platforms (Dec, Sgi, Linux, Windows)