Single voice command recognition by finite element analysis Prof. Dr. Sc. Raycho Ilarionov Assoc. Prof. Dr. Nikolay Madzharov MSc. Eng. Georgi Tsanev
Applications of voice recognition Playing back simple information Call steering Automated identification Commands for working with windows and programs Controlling an wheelchair Usage in education and daily life Robotics Removing IVR menus
Playing back simple information If you have customers who need fast access to information. In many circumstances customers do not actually need or want to speak to a live operator. For example, if they have little time or they only require basic information then speech recognition can be used to cut waiting times and provide customers with the information they want. Many businesses, then, are opting for intelligent speech recognition systems in order to intake a large increase in calls without having to increase staff. The Dublin Airport provides a good example of this. By deploying intelligent speech recognition technology, they were able to cope with a 30% increase in passenger numbers without needing to increase staff levels. Incoming customer calls are filtered according to requirement. Those wanting basic information, say on departures or arrivals, are automatically directed to the speech recognition system that quickly evaluates the nature of the enquiry through a series of prompts. At all times there is an option to speak with a live operator, if necessary. The system has been fine-tuned to pick up the vagaries of the Irish accent. The average call time has been reduced to just 53 seconds, freeing up skilled agents to handle more difficult calls.
Call steering This means putting callers through to the right department. Waiting in a queue to get through to an operator or, worse still, finally being put through to the wrong operator can be frustrating to a customer, resulting in dissatisfaction. By introducing speech recognition, callers can choose a self-service route or alternatively say what they want and be directed to the correct department or individual. Here s a good example: Standard Life is using speech recognition for its Life and Pensions business. The solution helps in three ways: 1) it ascertains what the call is about; 2) if necessary it takes the customer through security checks; and 3) it transfers the customer to the appropriate member of staff. The details that the customer has already provided appear on the screen so that the information does not need to be repeated. Using this technology, Standard Life increased its overall call handling capacity by over 25% and reduced their misdirected calls by 66%. The system also gives them a better understanding of why customers are calling, as it allows the customer to voice requests rather than forcing them to conform to an organisation s preconceptions of what a customer wants.
Automated identification This allows for authenticating someone s identity on the phone without using risky personal data. Identity fraud is now one of the biggest concerns facing UK organisations and research by the UK s fraud prevention services estimate that it is costing the UK 1.7 billion a year. Some advanced speech recognition systems provide an answer to this problem using voice biometrics, a technology which is now accepted as a major tool in combating telephone-based crime. On average it takes less than two minutes to create a voiceprint based on specific text such as Name and Account Number. This is then stored against the individual s record, so when they next call, they can simply say their name and if the voiceprint matches what they have stored the person is put straight through to a customer service representative. This takes less than 30 seconds and also bypasses the need for the individual to have to run through a series of tedious ID checks such as passwords, address details and so on. Australia s 8th largest insurers, ahm Health Management, is successfully using voice biometrics to allow existing account holders to speak to customer service representatives quickly and securely. The company has enrolled more than 20,000 customers voiceprints.
Automated identification This allows for the replacement of complicated and frustrating push button IVR menus. Due to poorly implemented systems, IVR and automated call handling systems are quite unpopular with customers. However, there is a way to improve this scenario. Intelligent Call Steering (ICS) does not involve any button pushing. The system simply asks the customer what they want (in their words, not yours) and then transfers them to the most suitable resource to handle their call. Callers dial one number and are greeted by the message Welcome to XYZ Company, how may I help you? The caller is routed to the right agent within 20 to 30 seconds of the call being answered, with misdirected calls reduced to as few as 4%. By introducing Natural Language Speech Recognition (NLSR), general insurance company Suncorp replaced its original push button IVR, enabling the customer to simply state what was wanted. Using a financial services statistical language model of over 100,000 phrases, the system can accurately assess the nature of the call and transfer it first time to the appropriate department. The company reduced its call waiting times to around 30 seconds and misdirected calls to virtually nil.
Approach in voice control systems Suppose that the voice control system is based on the set of single commands containing no more than four words. A basic principle of our approach is: there should not be randomly removal of samples. Recording conditions are varied, ranging from prepared rooms with professional equipment to commercial sound cards under normal acoustic conditions. We use the standard sampling frequency equal to 44.1kHz to sample the entire audio range.
Method of processing
Our main contribution to the voice recognition investigation f i s) i ( fs) p ( n n It means that each input signal is processed in a different frequency p
Algorithm frequencies samples
Normalising w ~ w w Convertor f i s) i ( fs) p ( n n p
Reductor f ( k ) kf n s Hamming or Blackman window
Mel Scale and Mel filter Bank.
. Comparing the input signal and the pattern A set of samples with b 2 a 2 Another set of samples b 2 a 2 Again Hermittian Interpolant
Define a Quasi Hermittian interpolant
Score functional < 5 % J h u E I E h I u h u u p p J h u
Real Life Experiments
Thank you!