SIV for VoiceXM 3.0: a n g u a g e a n d A p p l ica t ion D es ig n C on s id era t ion s Ken Rehor C i s c o S y s t em s, I nc. krehor@cisco.com March 05, 2009
G VoiceXM Application Architecture PSTN / V o I P V o I P a t e w a y Se r v e r (HTTP) I P V e r i f i c a t i o n A SR TTS A u d i o D TM F e n g i n e V P D B
SIV in VoiceXM 2.x S e r v e r -s i d e S I V p r o c e s s i n g < rec ord > < f i el d > w i t h rec ord u t t era nc e a n g u a g e e x t e n s i o n s N u a nc e " v oi c ep ri nt f orm s " B ev oc a l
VoiceXM 2.x SIV Int eg r a t ion r e c o r d u t t e r a n c e <r e c o r d > PSTN / V o I P Se r v e r <s u b d i a l o g > (HTTP) I P V e r i f i c a t i o n e n g i n e V P D B
VoiceXM 2.x SIV Int eg r a t ion r e c o r d u t t e r a n c e <r e c o r d > PSTN / V o I P Se r v e r <s u b d i a l o g > (HTTP) I P V e r i f i c a t i o n e n g i n e V P D B
St a nd a r d VoiceXM p r om p t / f iel d m od el T e x t -i n d e p e n d e n t < p rom p t > / < rec ord > S u b m i t rec ord i ng t o a p p l i c a t i on s erv er T e x t -d e p e n d e n t, T e x t -p r o m p t e d < p rom p t > / < f i el d > (w i t h rec ord u t t era nc e) S u b m i t u t t era nc e rec ord i ng t o a p p l i c a t i on s erv er
VoiceXM 2.x <record> <form name="verify"> <! -- c ou l d u s e ex t ernal g rammar --> <record name="u t t eranc e" max t ime="5 s <p romp t > S ay t h is d ig it s eq u enc e: one t w o t h ree fou r five. </ p romp t > <noinp u t > I d id n' t h ear anyt h ing, p l eas e t ry ag ain. </ noinp u t > </ rec ord > <b l oc k > <s u b mit nex t ="c h ec k _ u t t eranc e. p l " enc t yp e="mu l t ip art / form-d at a" met h od ="p os t " namel is t ="u t t eranc e"/ > </ b l oc k > </ form>
VoiceXM 2.1 <f i el d> <form name="verify"> <p romp t >S ay t h is d ig it s eq u enc e: one t w o t h ree fou r five. </ p romp t > <fiel d t yp e="d ig it s "> <fil l ed > <! - - if s p ok en d ig it s mat c h ex p ec t ed res p ons e, t h en p roc es s voic e mod el - - > </ fil l ed > </ fiel d > </ form>
VoiceXM 2.1 <f i el d> w recordu t t era n ce it h <form name="verify"> <property name="rec ord u tteranc e" v al u e="tru e"/ > <p romp t >S ay t h is d ig it s eq u enc e: one t w o t h ree fou r five. </ p romp t > <fiel d t yp e="d ig it s "> <fil l ed > <! - - if s p ok en d ig it s mat c h ex p ec t ed res p ons e, t h en p roc es s voic e mod el - - > </ fil l ed > </ fiel d > </ form>
Security Concerns
Architecture / Security / Trust O n e a r c h i t e c t u r e m a y n o t b e s u i t a b l e f o r e v e r y u s e c a s e S om e a rc hi t ec t u res m a y not s u p p ort t he l ev el of (d i s )t ru s t req u i red f or a p a rt i c u l a r d ep l oy m ent
M W W O Security, Trust a n d P ro to co l C o n sid era tio n s in D istrib uted V o ice W eb Ap p l ica tio n s Architecture options carry security implications < vx m l >.wav PSTN o r I P n e t w o r k b r o w s e r V o i c e W e b Se r v e r A u t h e n t i c a t i o n W e b Se r v i c e TTS R C P Se r v e r A SR? V o i c e t e m p l a t e d a t a b a s e t h e r e b Se r v i c e s s m l i c e t e m p l e <s > <grxml> vo at V o i c e t e m p l a t e d a t a b a s e V o i c e D E F F m a y b e u s e d b e t w e e n c o m p o n e n t s a n d s e r v i c e s e b Se r v i c e i n t e r f a c e
F M m S I V eng i ne a nd d a t a b a s e m.wav a na g ed b y A p p s erv er VoiceXM browser records the utterance and forwards to app server ( ty pical scenario for VoiceXM 2. 0 / 2. 1 ) < vx m l > b r o w s e r PSTN < r e c o r d > o r I P n e t w o r k M R C P C l i e n t a u d i o V o i c e W e b Se r v e r a u d i o No t e : D TM p r o c e s s i n g n o t s h o w n TTS <s s m l > R C P Se r v e r A SR <grxml> V o i c e t e m p l a t e d a t a b a s e V o i c e t e m p l a t e s a n a g e d a n d s t o r e d l o c a l l y b y e n g i n e vo i c e t e m p l at e A u d i o s t r e a m v s. b u f f e r s St r e a m i n g h a n d l e d b y R TP? B u f f e r s m a y b e h a n d l e d b y a u d i o r e c o r d e r f u n c t i o n. Pa r t o f b r o w s e r o r M R C P e n g i n e?
F M m S I V eng i ne a nd d a t a b a s e m a na g ed b y A p p s erv er VoiceXM browser records the utterance and forwards to app server ( ty pical scenario for VoiceXM 2. 0 / 2. 1 ) < vx m l >.wav S erv ice P rov id er b r o w s e r PSTN < r e c o r d > o r I P n e t w o r k M R C P C l i e n t a u d i o V o i c e W e b Se r v e r I P V o i c e W e b Se r v e r a u d i o No t e : D TM p r o c e s s i n g n o t s h o w n TTS <s s m l > R C P Se r v e r A SR <grxml> V o i c e t e m p l a t e d a t a b a s e V o i c e t e m p l a t e s a n a g e d a n d s t o r e d l o c a l l y b y e n g i n e vo i c e t e m p l at e
F M m S I V eng i ne a nd d a t a b a s e m a na g ed b y M RC P s erv er < vx m l >.wav b r o w s e r PSTN o r I P n e t w o r k M R C P C l i e n t V o i c e W e b Se r v e r No t e : D TM p r o c e s s i n g n o t s h o w n a u d i o A u d i o s t r e a m v s. b u f f e r s R C P Se r v e r St r e a m i n g h a n d l e d b y R TP? TTS A SR B u f f e r s m a y b e h a n d l e d b y a u d i o r e c o r d e r f u n c t i o n. Pa r t o f b r o w s e r o r M R C P e n g i n e? <s s m l > <grxml> V o i c e t e m p l a t e d a t a b a s e vo i c e t e m p l at e V o i c e t e m p l a t e s a n a g e d a n d s t o r e d l o c a l l y b y e n g i n e
F M S I V en g in e ma n a g ed b y M R C P serv er S I V d a t a b a se ma n a g ed b y a p p serv er V oice mod el t ra n smission ma n a g ed b y en g in e or M R C P S erv er < vx m l >.wav b r o w s e r PSTN o r I P n e t w o r k M R C P C l i e n t V o i c e W e b Se r v e r No t e : D TM p r o c e s s i n g n o t s h o w n a u d i o V o i c e t e m p l a t e d a t a b a s e vo i c e t e m p l at e R C P Se r v e r V o i c e t e m p l a t e s r e t r i e v e d f r o m d a t a b a s e b y a p p s e r v e r TTS <s s m l > A SR <grxml> vo i c e t e m p l at e
F M m S I V en g in e ma n a g ed b y M R C P serv er S I V d a t a b a se ma n a g ed b y a p p serv er V oice mod el t ra n smission ma n a g ed b y V oicex M b row ser < vx m l >.wav b r o w s e r PSTN o r I P n e t w o r k M R C P C l i e n t V o i c e W e b Se r v e r No t e : D TM p r o c e s s i n g n o t s h o w n a u d i o R C P Se r v e r V o i c e t e m p l a t e d a t a b a s e V o i c e t e m p l a t e s a n a g e d a n d s t o r e d l o c a l l y b y e n g i n e vo i c e t e m p l at e TTS <s s m l > A SR <grxml> vo i c e t e m p l at e V o i c e t e m p l a t e s r e t r i e v e d f r o m d a t a b a s e b y a p s e r v e r
in V oicex M 3. 0
V 3 I n teg ra tio n R eq uirem en ts C ont rol m u l t i p l e I np u t Res ou rc es A S R a n d b i o m e t r i c e n g i n e s S i m u l t a n e o u s l y S w i t c h o n a p e r < f i e l d > o r v e r i f i c a t i o n b a s i s C ons i s t ent w i t h V 3 ov era l l d es i g n g oa l s S i m p l i f y i nt eg ra t i on, y et p rov i d e s u f f i c i ent c ont rol
SSM V 3 D a t a, E v e n t r e l a t i o n s h i p b e t w e e n c o m F A C o m m a n d s f r o m o t h e r r e s o u r c e c o n t r o l l e r s Ma r k d a t a e v e n t s p o n e n t s R e s o u rce C o n t ro l l e r (an object with semantics similar to form item) B a r g e -i n o n / o f f, d o n e St o p, P l a y A d d g r a m m a r ( ) A d d v o i c e p r i n t ( ) P ro m p t q u e u e I n p u t I n p u t 2 I n p u t 3 R e s o u rce s Inputs are all sessi o n-lev el St o p, P l a y E v e n t s : m a r k, e r r o r, d o n e a u d i o, D T MF a u d i o Recording types to consider: < record> U ttera nce recording W h ol e-ca l l recording ( tw o-ch a nnel? ) M u l ti-tu rn recording ( e. g. m ix ed-initia tiv e recording) SSM/media player r e c og n it ion de v ic e ( s ) audio r e c or de r v e r if ic at ion, e t c YY OO U U AA RR E E HH EE RR EE
" Sessio n " E nrol l m ent S es s i on or V eri f i c a t i on S es s i on V eri f i c a t i on p roc es s : Uninterrupted process over several dialog states ( h aving a S ession-i D ) w h ere th e results of each utterance are cum ulated V o i ce X M S e s s i o n V e ri f i cat i o n S e s s i o n S I V d i al o g S I V d i al o g S I V d i al o g
D ef in e D a ta M o d el D a t a p a s s t S I V i E n v i r o n m e n t P r o p e r t i e s A t t r i b u t e s V o i c e m o d e l s D a t a u f S I V i R e s u l t s s p e c i f i e d a s a n E M M A r e s u l t E r r o r s / i n f o D a t a u s w i t S I V s s i A s s i a t S I V u l t w i t A S u l t ed o eng ne ret rned rom eng ne ed hi n es on oc e res h R res
D ef in e ev en t m o d el C o m b i n e r e f e r e n c e s f r o m : V oi c ex M F oru m M RC P v 2 E ng i ne v end ors
V oicex M a nd W eb Serv ices
VoiceXM 2.x/ 3.x SIV Int eg r a t ion v ia B IA S w eb s er v ice PSTN / V o I P B r o w s e r (HTTP) r e c o r d u t t e r a n c e <r e c o r d > I P BIAS (W e b S e r v i c e ) V e r i f i c a t i o n Bi o AP I e n g i n e V P D B
VoiceXM 2.x/ 3.x SIV Int eg r a t ion v ia <s u b di a l og > PSTN / V o I P B r o w s e r (HTTP) <s u b d i a l o g > r e c o r d u t t e r a n c e <r e c o r d > (HTTP) I P V e r i f i c a t i o n e n g i n e V P D B
VoiceXM 3.0 SIV Int eg r a t ion PSTN / V o I P B r o w s e r (HTTP) Bi o AP I, M R C P, e t c. V P D B e n g i n e V 3 S I V n at i v e l an g u ag e f e at u re s B ro w s e r/ i n t e g rat i o n v i a B i o A P I, MR C P, p ro p ri e t ary A P I, e t c.
VoiceXM 3.0 SIV Int eg r a t ion PSTN / V o I P B r o w s e r (HTTP) <s u b d i a l o g > (HTTP) I P V e r i f i c a t i o n e n g i n e Bi o AP I, M R C P, e t c. e n g i n e V P D B V 3 S I V n at i v e l an g u ag e f e at u re s B ro w s e r/ i n t e g rat i o n v i a B i o A P I, MR C P, p ro p ri e t ary A P I, e t c.
VoiceXM SIV Int eg r a t ion v ia B IA S w eb s er v ice or <s u b di a l og > r e c o r d u t t e r a n c e <r e c o r d > PSTN / V o I P B r o w s e r (HTTP) <s u b d i a l o g > (HTTP) I P BIAS (W e b S e r v i c e ) V e r i f i c a t i o n e n g i n e e n g i n e V P D B
VoiceXM A p p l ica t ion Sw it ch ing r e c o r d u t t e r a n c e <r e c o r d > PSTN / V o I P B r o w s e r (HTTP) <s u b d i a l o g > (HTTP) I P V e r i f i c a t i o n e n g i n e e n g i n e V P D B
P ros a nd Cons of N a tiv e V 3 f unctions
V 3 N a tiv e F un ctio n s: P ro s a n d C o n s SIV engines controlled directly by VoiceXM P r o s C o n s V 3 R u t : l t a n p g g. A S R + P n S A C t A l B t a p p d D, l, m S I S B t P l a t f V d / S P S E m E eq iremen simu eou s rocessin More than one input eng ine, e. Verify erf orma ce peed/ responsiveness ccuracy on sist en cy of resou rce con rol ig ned with other input resources en ef it o ev el op ers on' t have to buy instal aintain V eng ine hared resource on VXM pl atform en ef it o orm en ors erv ice rov id ers hared resource ase of depl oy ent nhanced service offering N a v a a b l t a y ; n a n t u t A p p E ' S y E ' F u l l p a b y l a l g w a y f m l P l a t f t t n a n d a y C P m ( C P? ) ot il e od eed in erim sol ion con cern s nabl es devel opers to do bad' thing s ecu rit con cern s nabl es devel opers to do bad' thing s ort il it st il on of Voice odel s, eng ine capabil ities, resul ts/ errors, etc. are al proprietary orm in eg ra ion ot st rd et MR v2 not sufficient; need ore features MR v3
O F R e s o u r c e C o n t r o l a n d D i s t r i b u t e d D e c i s i o n M a k i n g VoiceXM browser - - R esource C ontrol l er VoiceXM docum ent/ script I nput review < vx m l > V o i c e W e b A p p -s p e c i f i c D e c i s i o n s <s s m l > <grxml>.wav.wav r e s u l t s <grxml> r e s u l t s vo i c e m o d e l r e s u l t s TTS A u d i o p l a y e r A u d i o r e c o r d e r A SR D TM u t p u t I n p u t A SR a u d i o q u a l i t y c o n f i d e n c e / t h r e s h o l d o r n o m a t c h r e s u l t : w o r d o r p h r a s e D T MF r e s u l t : d i g i t s t r i n g o r n o m a t c h -s p e c i f i c D e c i s i o n s u s e r s e l e c t i o n a u t h e n t i c a t i o n