Study of Humanoid Robot Voice Q & A System Based on Cloud



Similar documents
The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology

Open Access Research and Design for Mobile Terminal-Based on Smart Home System

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Study on the Evaluation for the Knowledge Sharing Efficiency of the Knowledge Service Network System in Agile Supply Chain

Research on Sports Information Technology Education Platform Based on ASP-NET Technology

Hardware Implementation of AES Encryption and Decryption System Based on FPGA

Open Access Design of a Python-based Wireless Network Optimization and Testing System

Modeling for Web-based Image Processing and JImaging System Implemented Using Medium Model

Current Status and Problems in Mastering of Sound Volume in TV News and Commercials

Design of Wireless Home automation and security system using PIC Microcontroller

Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform

Design for Management Information System Based on Internet of Things

How do non-expert users exploit simultaneous inputs in multimodal interaction?

How To Analyze The Time Varying And Asymmetric Dependence Of International Crude Oil Spot And Futures Price, Price, And Price Of Futures And Spot Price

The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

Make search become the internal function of Internet

COMPARATIVE ASSESSMENT OF ONLINE HYBRID AND FACE-TO- FACE TEACHING METHODS

Step 1: Select the Start Menu, then Control Panel.

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Evaluation of Wireless, Digital, Audio-Streaming Accessories Designed for the Cochlear Nucleus 6 Sound Processor

Everybody has the right to

CONTENTS. Zulu User Guide 3

The Design and Application of Water Jet Propulsion Boat Weibo Song, Junhai Jiang3, a, Shuping Zhao, Kaiyan Zhu, Qihua Wang

Welcome to the United States Patent and TradeMark Office

Study on Cloud Service Mode of Agricultural Information Institutions

The LENA TM Language Environment Analysis System:

Design Patents for Animated Images: Development Trends

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML

PC SECURITY LABS. Desktop Browser Performance Review. CPU USAGE RAM USAGE Power Consumption CANVASMARK 2013 GUIMARK - VECTOR CHARTING TEST.

CASE STUDY OF VEHICLE PARKING MOBILE PAYMENT APPLICATION: DATA STORAGE AND SYNCHRONIZATION SOLUTION

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

SPeach: Automatic Classroom Captioning System for Hearing Impaired

Development of a Service Robot System for a Remote Child Monitoring Platform

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems

Research and Practice of DataRBAC-based Big Data Privacy Protection

The Study on the Effect of Background Music on Customer Waiting Time in Restaurant

Study on Cloud Computing Resource Scheduling Strategy Based on the Ant Colony Optimization Algorithm

Experts in Leak Detection & Pressure Control

Analog Representations of Sound

Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations

Optimization of Distributed Crawler under Hadoop

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

An Arabic Text-To-Speech System Based on Artificial Neural Networks

BRIDGE BROADCASTING. The use of voice broadcasts for bridge teaching and mentoring has become very popular.

CELL PHONE AUDIO CONTROLLED POINT OF SALE TECHNOLOGY REVIEW REPORT. Alana Sweat Group 9 October 18, 2008

Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb

User Recognition and Preference of App Icon Stylization Design on the Smartphone

Appendices master s degree programme Human Machine Communication

DECT. DECT Density. Wireless Technology. and WHITE PAPER

Hybrid system and new business model

Service Process Model and Human Resource Management

Emotion Detection from Speech

The Complete Guide to DEVELOPING CUSTOM SOFTWARE FOR ANY BUSINESS CHALLENGE

On Cloud Computing Technology in the Construction of Digital Campus

Application of Tracking Technology to Access-control System

Understanding Alarm Systems

Proceedings of Meetings on Acoustics

Linguistic Preference Modeling: Foundation Models and New Trends. Extended Abstract

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

Analysis of Defects in Financial Accounting Management of Construction Enterprises and Corresponding Strategies

The Keyboard One of the first peripherals to be used with a computer and is still the primary input device for text and numbers.

DSP Laboratory: Analog to Digital and Digital to Analog Conversion

Research on the Facilitation of E-commerce Technology Development on New-type Urbanization Construction in the Internet of Things Era

Develop Software that Speaks and Listens

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Testing FM Systems on the FP35 Hearing Aid Analyzer -1-

IVR Primer Introduction

An Investigation on Learning of College Students and the Current Application Situation of the Web-based Courses

Introduction 1.1 CONCEPT

On the move: technology Great technology makes for great sound

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

I N T R O D U C I N G

Brief Introduction Thump Bluetooth Wireless Headphones features What s in the package? Bluetooth Wireless technology...

Wireless LAN Concepts

ACCESS CHARGE A fee charged subscribers or other telephone companies by a local exchange carrier for the use of its local exchange networks.

BCS1 Bluetooth Aux Car Adapter Kit

Research on the UHF RFID Channel Coding Technology based on Simulink

Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards

Design of Fuzzy Drip Irrigation Control System Based on ZigBee Wireless Sensor Network

Speech Recognition Software Review

RFID based Bill Generation and Payment through Mobile

Application of Android Mobile Platform in Remote Medical Monitoring System

Bluecoin - Voice and Music Over an Embedded BLE Platform. Central Labs AST Robotics

Intelligent Home Automation and Security System

User Interface Design, Animations and Interactive Examples for Webbased

Transcription:

Send Orders for Reprints to reprints@benthamscience.ae 1014 The Open Cybernetics & Systemics Journal, 2015, 9, 1014-1018 Open Access Study of Humanoid Robot Voice Q & A System Based on Cloud Tong Chunya and Zhong Qiubo * Shool of Electronic and Information Engineering, Ningbo University of Technology, Ningbo, Zhejiang, 315016, China Abstract: As voice technology receives wide application in our daily lives, Question-and-answer service is taking shape in the research of humanoid robot. This paper designs a humanoid robot Voice Q & A system based on cloud. After the robot collects voice signals, it would first process the signal to understand the question, and then searches the Internet for the best answers and tell it to the user with voice. In this system, dominant cloud technology is introduced and robot technology, speech recognition technology and Internet searching technologies are combined, providing people with a new way of human-computer interaction through natural language. Results show that with the application of cloud technology, the efficiency of speech recognition and the accuracy of answers have been enhanced substantially. Keywords: Human-computer interaction, humanoid robot, question-and-answer service, voice cloud. 1. INTRODUCTION Voice is the most common way of communication. According to statistics, 70%~80% of information is passed down through dialogue. 25%~50% in office involves faceto-face conversation. And 10%~20% time is spent on the phone [1]. Thus, the application of speech recognition technology in human-computer interaction has always been a hot issue. It has a promising prospect. As robot technology and artificial intelligence are gaining momentum, more and more smart robots come into our lives. But current interaction between human and robot remains the traditional way of order, such as through button or switch. Such communication is dehumanized. In order to talk with robots more convenient and more natural, the humancomputer interaction based on natural language becomes increasingly important. In Oct. 2011, on Apple Inc. s press conference for iphone 4S, the private voice assistant Siri attracted much attention. The birth of Siri, a talkative intelligent robot, indicates that speech recognition is no longer a distant dream. And intelligent robot interaction technology based on natural language leads a new wave of strategy. This paper proposes a humanoid robot Voice Q & A system according to the speech recognition technology of Xunfei Voice Cloud and Baidu s search platform of Questionand-Answer Service Cloud. Question-and-Answer Service Cloud. The system collects voice questions from users and sends them to speech recognition engine of the Cloud. The words are then sent to Baidu Zhidao for answers through the robot and the text answers finally return to the robot. 2. DESIGN FRAMEWORK OF VOICE Q & A SYSTEM Considering that the native libraries of the speech recognition engine in the robot is complicated and far from 1874-110X/15 complete. The Voice Q & A system employs the speech recognition technology in the Cloud to process data. The robot would automatically connect to the search engine in the Cloud. The design framework of the system is shown in Fig. (1) (next page). It consists of three modules: NAOrobot, Xunfei Voice Cloud and Baidu Question-and-Answer Service Cloud. Step 1: Use the voice sensor on the robot to collect voice data; Step 2: The robot visits Xunfei Voice Cloud and sends the voice data to the Voice Cloud for processing and speech recognition. Output text questions to the robot; Step 3: The robot receives text questions, visits Baidu Question-and-Answer Service Cloud for answers, and sends back the best answer; Step 4: The robot receives the text answers, processes texts, integrates voices and speaks the answer out. 2.1. Voice Data Collection This system involves NAOrobot, an intelligent humanoid robot with four microphones [2]. Voice data are collected by NAOrobot through C++ with the support of Windows, Linux. The ALSoundExtractor module in the robot can produce a voice processing function. First, set up the format of sound, including sample frequency, channel settings, etc. Then, collect sound clips and send them to the buffer for processing. Finally, after the complete processing and end voice data collection. 2.2. Reception and Transmission of Voice Data After the robot collects voice data, send the voice data to the speech recognition engine in the Cloud under open network through access point of Xunfei Voice Cloud and send the text data back to the robot through the Internet. The robot will visit the Question-and-answer service Cloud. With the 2015 Bentham Open

Study of Humanoid Robot Voice Q & A System Based on Cloud The Open Cybernetics & Systemics Journal, 2015, Volume 9 1015 Q & A service of Baidu Cloud Baidu knows Problem of text The best answer of text Speech input problem Collect audio date Analysize TTS answer Speech output Humanoid robot NAO audio message text message ASR ifly Speech Cloud Fig. (1). Design framework of voice Q & A system. Fig. (2). The network structure of Xunfei voice cloud. speech recognition engine in the Cloud, the accuracy is largely enhanced. 3. SPEECH RECOGNITION, VOICE INTEGRATION AND QUESTION-AND-ANSWER SERVICE 3.1. Speech Recognition Technology Speech recognition technology, also named as automatic speech recognition (ASR), is key to human-computer interaction technology. The robot can transmit the voice signal to corresponding texts or orders after recognition and understanding. It equals to put ears on the robot and makes it audible. A robot with ears can better fulfill human-computer interaction in a natural way [3]. Now, speech recognition telephone and speech recognition notebook spring up in the market, such as Voice Organizer [4] of US. VPTC and Parrot [5] of France. Speech recognition system is in essence a multiple recognition system, according to the process of speech recognition by human [6]. Speech recognition should be based on acoustics, phonetics and linguistics in order to have automatic recognition of high quality [7]. In this paper, Xunfei Voice Cloud is introduced to help with speech recognition. The network structure of Xunfei Voice Cloud is shown in Fig. (2) [8]. The main function of Xunfei Voice Cloud is to transmit voice to the Cloud Server and check whether the recognition result is given. If so, send the recognized texts to the local. The process of speech recognition is shown in Fig. (3).

1016 The Open Cybernetics & Systemics Journal, 2015, Volume 9 Chunya and Qiubo QISRInit QISRSessionBegin QISRGrammarActivate Fig. (3). The process of speech recognition. QISRGetResult QISRSessionEnd QISRFini 3.2. Voice Integration Technology Voice integration technology outputs the voice signal according to texts and following voice processing rules. So that the robot or the computer can read the texts fluently and people can understand when they hear. It plays the role of a mouth and enables the robot to speak [10]. The technology that transmits texts to voice is also named as Test to Speech (TTS). This system employs voice integration module in the NAOrobot to finish the speaking. But it is worth of noticing that this module supports the text coding of utf-8. However, the default text coding of vs 2010 in the windows system is GBK. So, before the NAOrobot reads the texts, there is a necessity to transmit utf-8 to GBK through Unicode. 3.3. Question-and-Answer Service Cloud Natural language processing based on large-scale corpus is a traditional way of semantic recognition technology. But the difficulty lies in the establishment of the corpus. As the Internet develops rapidly, it has become a huge and global information space that can serve as the corpus of Questionand-answer service. Google, Baidu, Sogou, SoSo and other search engines are all available. This paper selects Baidu Zhidao as the Internet corpus. The most important reason is that among all social question websites, Baidu Zhidao account is for 80% answers. It also enjoys 95% of users. Given the large population of users, the answers provided by Baidu Zhidao can be reliable and can meet the demand of users [9]. Baidu Question-and-Answer Service Cloud is mainly used to search for answers for text questions on the Internet and copy them. Search steps are described as below: Step 1: Search for the question; through the search, it can get many linked addresses relevant to the question. Rank the addresses according to their correlation. The first link has the largest correlation. Open it. Step 2: Figure out the best answer; Open the link address in Step 1 and read all answers. Obtain the webpage where the best answers locate. Step 3: Confirm the text answers; Obtain the source code of the webpage where the best answers locate and analyze the webpage in order to get the text answers. 4. RESULTS AND DISCUSSIONS Tests are carried out to see the accuracy of speech recognition of Xunfei and relevance of the answer to the question. The Internet should be connected and smooth during the test. This paper selects 10 Chinese (without gender difference) randomly. There are 10 conditions, in which two are related to the environment (quiet and noisy) in the Fig. (4), three are related to the speed of speech (fast, moderate, slow) in the Fig. (5), three are related to ways of question (one proposes 10 prepared questions, each one proposes 10 prepared questions, and each one proposes 10 questions randomly) and two are related to Question-and-Answer Service Cloud website (Sougou Wenwen,Baidu Zhidao). Fig. (4). Voice signals in two environments: What is the programming language of the computer?.

Study of Humanoid Robot Voice Q & A System Based on Cloud The Open Cybernetics & Systemics Journal, 2015, Volume 9 1017 Table 1. Speech recognition test: each one proposes 10 prepared questions, two environments and three speech speed. Test Status Quiet Noisy Fast Moderate Slow Accuracy of speech recognition 90.4% 76.1% 79.0% 90.4% 81.7% Table 2. Test result of one person raising 10 prepared questions to the robot. Users Question Text Recognition Successful Recognition Rate Correlation What s your name? What s your name 100% 0% What can you do? What can you do 100% 0% Where is Ningbo? Where is Ningbo What are interesting places in Ningbo? What are the programming language of the computer Where is the capital of U.S.? When is the Christmas? Which country does Santa Claus come from? Who is the author of The Old Man and the Sea? What does it mean by I am drunk? What are interesting places in Ningbo? What are the programming language of the computer Where is the capital of U.S.? When is the Christmas? Which country does Santa Claus come from? Who is the author of The Old Man and the Sea? What does it mean by I am drunk? 100% 99.5% 90% 87% 87.5% 100% 90% 94% Fig. (5). Three speech speed of voice signals: What is the programming language of the computer?. Table 3. Test result of each one raising 10 random questions to the robot. From Tables 1-4, we can see with the access to the Internet, if the questioner speaks Mandarin, or if the question is reasonable, the speech recognition accuracy and the correlation of the answer can meet the expected requirement. It can be applied to humanoid robot and fulfill human-computer interaction based on natural language. Performance Speech Recognition Correlation Success rate 0.91 0.95

1018 The Open Cybernetics & Systemics Journal, 2015, Volume 9 Chunya and Qiubo Table 4. Comparison of question-and-answer service cloud. Question-and-Answer Service Cloud Sougou Wenwen Baidu Zhidao Success rate 0.70 0.97 Correlation 0.97 0.95 CONCLUSION The study of humanoid robot Voice Q & A system is a hot issue and holds significance. This paper designs a humanoid robot Voice Q & A system based on the Cloud. Through integrating Xunfei Voice Cloud and Baidu Question-and-Answer Service Cloud and processing the voice in the Cloud, the speech recognition accuracy can be largely enhanced. The answers provided are also highly relevant to the question. It realizes the goal that the robot hears the question and gives answers after a short pause. This system makes natural semantic recognition possible and provides an intelligent way of human-computer interaction. But there are still limitations, such as limited to one-question-one-answer and user-oriented. CONFLICT OF INTEREST The authors confirm that this article content has no conflict of interest. ACKNOWLEDGEMENTS This work was supported by Zhejiang Provincial Natural Science Foundation of China (No.LQ12D01001, No. LQ12F03001), Natural Science Foundation of China (No. 61203360), Ningbo City Natural Science Foundation of China (No.2012A610043, No.2012A610009). REFERENCES [1] L. Zheng, and W. Minghui, The research and implementation of user and computer interactional contact surface based on pronunction, Journal of Jingmen Technical College, vol. 6, pp. 14-18, 2007. [2] NAO Software 2.1 documentation Copyright. Aldebran-Robotics. 2014. [3] ifly Mobile Speech Platform 3.0 Xunfei Voice changes mobile life. Anhui Ustc Xunfei Information Technology Co., LTD. Computer & Information Technology. 2011(10). [4] K. Yi, and J. Cheng, A vocoder based on speech recognition and synthesis, In: Globcom'95, Sigarpore, 1995. [5] C. P. Chen, J. Bilmes, and K. Kirchhoff, Low-resource noiserobust feature post-processing on auroa, In: Proceedings of ICSLP, 2002, pp. 17-20. [6] M. Yafei, D. Zhongping, H. Huasong, and C. Lin, Design and realization of human-computer interaction based on intelligent voice, Radio & Television Information, vol. 1, pp. 67-69, 2014. [7] L. Lin, Study of speech recognition and the human-computer interaction system of domestic robot. Harbin Institute of Technology: Harbin, 2007. [8] ifly Mobile Speech Platform 2.0 Development manual of Xunfei Voice platform [EB\OL]. Anhui ustc xunfei information technology co., LTD. http://open.voicecloud.cn/index.php/default./doc centerlnner?itemtitle=ann3za= =.2014.12. [9] J. Nan, and W. Pengcheng, Study on the Evaluation creteria of relevance of users' need and information content in social question and answer service: take baidu knows as an example, Journal of Information Resources Management, vol. 3, pp. 35-45, 2012. Received: February 17, 2015 Revised: May 21, 2015 Accepted: June 09, 2015 Chunya and Qiubo; Licensee Bentham Open. This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/- licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.