Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts

Envox CDP 7.0 Performance Comparison of and Envox Scripts Goal and Conclusion The focus of the testing was to compare the performance of and ENS applications. It was found that and ENS applications have very similar performance characteristics, with the only significant difference being when TTS was used in the applications. When playing audio files, both, and ENS applications can be run on up to 240 channels without any significant delays. In TTS applications generally performs poorer than ENS and in the worst case significant delays start above 60 channels. However, in an average TTS application Envox CDP 7.0 should be able to handle 90 channels without significant delays, and it should be able to handle 120 ENS channels. displays better performance than ENS in ASR applications. Envox CDP 7.0 should be able to handle 120 channels with acceptable recognition delays (up to 3 seconds), and it should be able to handle up to 90 ENS channels with acceptable recognition delays. Please note: - it is always possible that specific applications display different results from the ones obtained in this test report, due to their complexity, implementation or other factors, - the results and conclusions published in this test report apply to the TDM setup only (either using digital hardware boards or using HMP interface "thin blade" boards), - the results are expected to be different in case of HMP setup (VoIP using SIP or H.323) because in that scenario the machine with HMP and Envox CDP 7.0 will have to process call control thereby making it slower (based on our previous HMP tests we know that processing call control is a bottleneck), - in applications that handle a large number of short calls the HMP bottleneck will be higher and may reduce the maximum number of channels by as much as 50%, - in applications that handle a smaller number of longer calls the bottleneck is not as apparent, but there will be some reduction in the maximum number of channels Envox CDP 7.0 can handle. 1

Environment Network: Envox-lab LAN, 100 Mbit Computers: Name CPU RAM Windows OS Software Hertz dual Intel Xeon 3.06 GHz 2 GB 2003 Server HMP 3.0 SU Tesla Intel P4 1.7 GHz 0.5 GB 2003 Server SR6.0 SU Pascal dual Intel Xeon 3.2 GHz 2 GB 2003 Server HMP 3.0 SU Additional software installed: Hertz: Dialogic HMP 3.0 Tesla: Dialogic SR6.0 SU Pascal: OSR Additional hardware installed: Hertz: DNI 1200 TEPHMP board ( protocol: NET 5 E1 ) Hertz: DNI 1200 TEPHMP board ( protocol: NET 5 E1 ) Tesla: DM/V 1200 board ( protocol: ml2_qsa_net5 network ) Tesla: DM/V 1200-A board ( protocol: ml2_qsa_net5 network ) Measurements: Referent computers for tests measurement were Hertz and Tesla. All measurements were performed using standard windows performance monitor with the following counters: Processor ( _Total ) \ % Processor Time Process ( EnvoxEngine ) \ % Processor Time Process ( EnvoxEngine ) \ Private Bytes Process ( EnvoxEngine ) \ Virtual Bytes Process ( EnvoxEngine ) \ Handle Count Performance intervals were every 30 seconds. After every test, Envox Engine was stopped, then telephony service was restarted through DCM and finally Envox Engine was started again. All Envox logging was disabled as well as Dialogic RTF logging. Envox: CDP version 7.0.0.5194 ( RC5 - Final ) 2

Tests (ens) vs VXML (ens) Testing will provide comparison between Envox script based on blocks and Envox script with VXML. The following aspects will be tested: Play prompt from audio file Play prompt with TTS DTMF detection ASR Programming aspects (loops, branches, variables) Parsing XML file Testing will be done on 30, 60, 90 and 120 channels with duration of at least 30 minutes. Logging will be done on single channel. Play prompt from audio file This test will have two scenarios: 1. short call, wait call play wave file release call Average call duration 1 9,9 9,8 9,7 9,6 9,5 9,4 9,3 9,2 9,1 9,0 3

Average play overhead 0,250 0,200 0,150 0,100 50 00 Calls in 20 minutes 30000 25000 Number of calls 20000 15000 10000 5000 0 4

2. longer call, wait call plays several wave files release call Average call duration 66,5 66,4 66,3 66,2 66,1 66,0 65,9 65,8 65,7 65,6 65,5 Average play(s) overhead 00 1,800 1,600 1,400 1,200 00 0,800 0,600 0,400 0,200 00 Conclusion Short call results from application machine (Hertz) shows that duration of play has almost constant overhead on any number of channels within tested range. Audio duration of used file is 8,728 seconds. Slight growth of average call duration is caused by longer call setup and call teardown time. Longer call, which consists of 10 different audio (wave) files with total audio duration of 63,728 seconds, behaves similarly to short call scenario. Difference between classic Envox block usage and is negligible. Winner: none 5

Play prompt with TTS This test will have two scenarios: 1. short call, wait call play TTS phrase release call Average call duration 1 1 9,0 8,0 7,0 6,0 5,0 4,0 3,0 Calls in 20 minutes 20000 18000 16000 Number of calls 14000 12000 10000 8000 6000 4000 2000 0 6

2. longer call, wait call play several different TTS phrases release call Average call duration 60 50 40 30 20 10 0 Calls in 20 minutes 6000 5000 Number of calls 4000 3000 2000 1000 0 Conclusion Short call results are consistent for 30 and 60 channels testing. On higher channel density the results are showing unstable behavior. Longer call, which consists of 10 different phrases, shows much more stability then short calls. shows different behavior with constant but noticeable delay over block implementation on 30 and 60 channels while on higher channel density delay has exponential growth. Winner: 7

DTMF detection This test will involve one DTMF detection. Average call duration 9,0 8,0 7,0 6,0 5,0 4,0 3,0 Average MakeCall duration 4,5 4,0 3,5 3,0 2,5 1,5 Conclusion Call for this test is very short. This means that call setup and teardown has large influence on test results. For instance almost half of delay is from prolonged call setup procedure. Other half is DTMF detection delay. Winner: none 8

ASR This test will focus on two aspects: Speed Accuracy For speed we'll measure after speech response time and for accuracy success or failure of recognition. This test will also cover two different grammars that ASR will use. 1. Simple grammar with only two items ( yes and no ) Average recognition duration 3,0 2,5 1,5 9

Recognition accuracy Recognition rate (%) 100 90 80 70 60 50 40 30 20 10 0 100 99,96 100 99,92 100 99,32 99,44 98,84 10

2. Grammar with 100 items Average recognition duration 3,5 3,0 2,5 1,5 Recognition accuracy Recognition rate (%) 100 90 80 70 60 50 40 30 20 10 0 100 99,88 100 99,83 99,75 99,95 98,09 98,93 Conclusion In both grammar complexity tests, recognition accuracy was at pretty high rate with practically no significant downgrade on a larger channel number. For ASR we used dedicated Open Speech Recognizer (OSR) server installed on Pascal. In worst case scenario, Pascal total CPU usage was 11

around 50%. Interesting information is that script has lower average recognition duration than block, but also has slightly lower recognition accuracy. Winner: Programming aspects This test will involve counter with 1000 iterations. Each even iteration will increase integer variable and each odd iteration will concatenate letter to string variable. Average call duration 8 7 6 5 4 3 2 1 Conclusion As it was expected, shows significantly better results than block implementation. Winner: 12

Parsing XML file This test will involve parsing of XML configuration file. XML element structure must have depth of at least eight levels. Average call duration 8 7 6 5 4 3 2 1 Conclusion Parsing XML is processor intensive operation, which leads to performance downgrade and deformation on higher channel numbers. Up to 120 channels, both and block have similar performance. Winner: 13

HMP Call Control Testing will be done using script with following blocks: Set Variable (setting two integer, two float and two string local variables) Delay (duration 0.5 sec) Counter (dummy counter that iterates from 1 to 1000) Delay (duration 0.5 sec) Play (voice system prompt that tells number 555 ) Play (voice system prompt that tells number 555 ) in both call control and non-call control environment on 30, 60, 90, 120, 150, 180, 210 and 240 channels. Duration will be around 30 minutes. Script that runs on first channel will have statistics logging before and after each of those 6 blocks. Scripts on other channels won't have such logging. Result analysis will consider following aspects: number of iterations performed in 30 minutes time period min, max and average values of duration of each block as well as between blocks comparison of call control and no call control results performance monitor results First Delay block duration Second Delay block duration 2,5 2,5 1,5 1,5 Call control Without call control Call control Without call control First Play block duration Second Play block duration 3,0 2,5 1,5 3,0 2,5 1,5 Call control Without call control Call control Without call control 14

Average delay between SetVar and Delay 0,6 0,4 0,3 0,2 0,1 "Call control" Conclusion Duration of a SetVar block was constant in both call control and without call control scenarios. Reason for such behavior is that SetVar block is synchronous block. First Delay block showed large difference between call control and no call control scenarios. As second Delay block doesn't have such a difference it's natural to conclude that it has something to do with call setup. Delay between SetVar block and first Delay block also confirms this. Delays between any other block is more or less constant and minimal. Finally, both play block have more or less same results with only little downgrade in a call control scenario. 15