Multimodal Web Content Conversion for Mobile Services in a U-City Soosun Cho *1, HeeSook Shin *2 *1 Corresponding author Department of Computer Science, Chungju National University, 123 Iryu Chungju Chungbuk, 380-702, Korea *2 Wearable Computing Research Team, U-Computing Research Department, IT Convergence Technology Research Laboratory, Electronics and Telecommunications Research Institute, 161 Kajong, Yusong, Taejon, 305-305, Korea sscho@cjnu.ac.kr, hsshin8@etri.re.kr Abstract A ubiquitous city is where everything is interconnected with everything else, where information is instantaneously shared. In a U-city, people can access a variety of web data in any kind of form by using small-screened mobile devices. In this paper we propose an approach which provides web information in either auditory or visual presentation by creating additional VoiceXML documents from existing HTML documents. This approach obtains the benefit that users are able to get web information by viewing and by hearing at the same time via mobile devices. We explain algorithms for the efficient conversions to multimodal web pages and a system design based on these algorithms. Keywords Multimodal Web Information, Web Document Conversion, HTML, VoiceXML, Mobile Services in a U-City 1. Introduction The U-City refers to a ubiquitous city designed for improving the economic growth and quality of people s lives in all social areas by using nextgeneration IT technologies. It is the integration of technology, information services, and people s lifestyles that is intended to make life more livable, safer, comfortable and convenient[1]. For example, people will be able to access a variety of mobile data in any kind of forms by using small-screened mobile devices in a U-city. They can view or hear the morning news, e-mail or messages from friends on way to work, and even can select the form of data to be comfortable in circumstances where they are. Because high performance mobile networks are prepared as the basic infrastructures in U-cities, presentation of multimodal web pages on mobile hand-held devices can be one of the really applicable information technologies. In this paper, we introduce an approach to convert existing rich web pages for desktop PCs to simple visual and auditory contents to support the mobile web user s needs who would be a resident in a U-city. From a viewpoint of web data reuse, it is valuable because there are huge amounts of web documents in the world which can be reused on small-screened devices by automatic transcoding technologies. From a viewpoint of supporting mobility of users, it is also valuable because this method provides the multimodal web data formats which can be selected according to users circumstances, for example, whether they are walking/driving or not. The basic idea in our approach is to provide web information in either auditory or visual presentation by creating additional VoiceXML[2, 3] documents from selected special content among HTML documents. We have exploited some suitable algorithms for the efficient conversion to multimodal web documents and implemented a prototype by using a system design based on these algorithms. The rest of this paper is organized as follows: In Section 2, we introduce the works related with our approaches. In Section 3, we explain the main algorithms of our approach and the design concepts of prototype system and we conclude in Section 4, finally. 2. Related Works To present rich web contents on small-screened mobile devices, there have been many kinds of approaches. DRESS(Document Representation for Scalable Structure)[4] applies the method of using the layout structure of existing web pages to support effective information acquisition as well as using the 23
Multimodal Web Content Conversion for Mobile Services in a U-City Soosun Cho, HeeSook Shin text summary method. WebAlchemist[5, 6] uses the semi-semantic information in several heuristic transcoding algorithms to get better quality of transcoded pages for complex web pages. But it still has a limitation because its basic idea is based on the partial extraction and page division. OPERA Mobile TM [7] browser based on the Small- Screen Rendering TM presents the original web pages to fit screen width in a single narrow column. It may be similar to original web pages but it did not suggest any solution to solve the long length of the pages. Our approach is the way to convert a long HTML text into a VoiceXML which needs not to be displayed on a screen, so it can reduce the size of viewing HTML document definitely. Thus it is a helpful displaying method when a small-screened device is used. On the other hand, the needs on access web pages through an auditory media have been increasing though the web is primarily designed for a visual access. But it is somewhat difficult to convert the whole HTML documents which consist of visual components into the voice-supported linguistic forms such as VoiceXML. Because the tags of VoiceXML cannot be substituted for the HTML tags and even multiple VoiceXML documents cannot express all of the information included in one HTML document. That is, VoiceXML is very different in ability of information expression from HTML. Therefore some existing transcoding methods targeted limited web contents[8], and other methods got a help from the additional translation information (e.g., annotation)[9, 10]. But these methods have some restrictions on supporting all original complicated web pages because they try to provide the information through only the voice media. Our approach is a different and enhanced method comparing to other existing methods, which provides web information in either auditory or visual presentation by creating additional VoiceXML documents. It generates voice information from representative content selected from a HTML document rather than from the whole HTML document. Our approach obtains the benefits that users are able to access web pages by viewing and by hearing at the same time. In other words, users can get information by viewing of current web page and by hearing of main content of hyperlinked page simultaneously without moving to the next pages. 3. Conversion to Visual and Auditory Information 3.1 Functions of Multimodal Web Access For the automatic conversions to web documents supporting the visual-auditory access, we choose a way to transcode a selected content block in a HTML document instead of transcoding from the whole HTML document into VoiceXML in consideration of efficiency. This technique shows specially high effectiveness and accuracy on the sites of news articles, web mail pages, web notice board, and so on. Following Figure 1 shows the example of transcoding results from Yahoo s Mail pages. In Figure 1 (a), once a hyperlinked object in a list is selected on the web page like A, the new web page is rendered and displayed on the screen like B. But if the web page A is converted by using our Figure 1. (a) Original Yahoo mail pages (b) Transcoded Yahoo mail pages 24
method, the web page C is generated as the result, which has the specific icon images like D to notify the possibility of access web information through sound media. And when a user selects the icon image, the web page C is unchanged and the text E, which includes the main information of the web page B, is read without moving to the next page B. Therefore the HTML web pages, such as A and B, can be converted into the revised HTML and generated VoiceXML pages, such as C and E. Of course if users select the original hyperlink of the text in web page C, they can go to the next web page B. So users can select more comfortable web data format, visual or auditory, according to their needs. Secondly, in order to extract an important content block from a HTML document, we use the Representative Content Extraction algorithm shown in Figure 3. As traveling the virtual tree which consists of structural tags, it selects target contents through the processing of comparisons with the parameters listed in Table 2. And then it creates VoiceXML documents including the text information of the extracted node. We suppose that this document is delivered to a VoiceXML interpreter, and it is spoken to a user through Text-To-Speech (TTS) engines. 3.2 Main Algorithms Now, we introduce two important heuristics algorithms carrying out aforementioned functions. First of all, a data tree structure is generated from an HTML document in the pre-process, and the virtual tree that consists of the tags indicating layout or data information is constructed. We exclude certain tags for script or event handler to define style properties from the virtual tree. The following algorithms work with this virtual tree as an input. In order to insert a specific icon image such as D in Figure 1, we execute the Voice Target Extraction algorithm which is represented as a flowchart in Figure 2. It finds the target objects which are connected to the next HTML documents via hyperlinks. We use the feature of visually separated structures to extract target objects. Maximumly five nodes can be candidates of target objects, and only one node is selected as a parent node of the extracted target nodes by using the comparative value listed in Table 1. After that, the sound icon for a new hyperlink is inserted as a previous sibling node of the extracted node. Table 1. Parameters for Comparing Patterns in Voice Target Extraction Algorithm Parameter The number of child nodes with hyperlink The number of characters of text in child node Tree travel path from the root node to child node Deflection value of length of text in child nodes Expected Value similar to sibling node < maximum value Figure 2. Voice Target Extraction Algorithm : It is needed to insert a specific icon image represents the accessibility of web information through sound media. 25
Multimodal Web Content Conversion for Mobile Services in a U-City Soosun Cho, HeeSook Shin Table 2. Parameters for Comparing Patterns in Representative Content Extraction Algorithm Parameter Text-length variations from sibling nodes Number of characters of the text Position in a whole tree Expected Value nearby center designed our transcoding system on a web browser environment and evaluated its functionalities. In the Target Object Extraction module, the target objects which have some hyperlinks are extracted. These hyperlinks are connected to the HTML documents which can be converted into voice information. Also, this module carries out the process of inserting the sound icon; it notifies that the specific voice information can be offered by sound. In the Representative Content Block Extraction module, the representative content is extracted from the hyperlinked HTML document. Finally, in VoiceXML Generation module, a VoiceXML document is generated by using the representative content extracted in the previous process. Our transcoding system works through two steps. The first step includes converting from an original HTML document to a revised HTML document which has sound icons. And the second step works for extracting important contents block from the selected HTML document and generating a VoiceXML document from the extracted contents. The HTML document generated through the first step is expressed visually via user's browser, and when a user selects the sound icon, the VoiceXML document is generated through the second step, and it is presented with sound by VoiceXML Interpreter and TTS. Through these processes, our approach supports the viewing of a current web page and the hearing of a representative part of other web page at the same time. Therefore users can acquire a lot of web information with more convenient interface at once. Step1 T a rg e t O b je c t (for C onversion) Extraction M odule C onverted H TM L Figure 3. Representative Content Extraction Algorithm : It selects target contents through the processing of comparisons and it creates VoiceXML documents including the text information of the extracted node. Step2 Representative C ontent Block Extraction M odule VoiceXM L G eneration M odule VoiceXM L 3.3 System Design Using the main algorithms, we designed a prototype of automatic transcoding system as following Figure 4. This system consists of three modules and has two operation steps. The voice information file can be generated through a web server, a browser, or a proxy server because the conversion process will be executed on these servers or client browsers. In this paper, we Step Module Output Figure 4. A block diagram of automatic transcoding system : The first step includes converting from an original HTML document to a revised HTML document which has sound icons. The second step includes extracting important content block from the selected HTML document and generating VoiceXML document. 26
4. Conclusions Our technology offers an efficient browsing environment because it has advantages of a visual and auditory presentation of HTML documents as well as a suitable viewing on a small screen. Also its multimodal accessibility provides an efficient interface on a mobile computing environment. It is one of the key technologies in the U-city because the multimodality supporting the mobile compute is essential to the living circumstances in a U-city where everything is interconnected with everything else, where information is instantaneously shared. Figure 5 shows the example of the operations on a multimodal web browsing system which will be implemented with our approaches in the near future. We have prototyped the transcoding system on our own web browser[11] for a PC instead of a hand-held device. The browsing result was acceptable on tests using real web pages as displays in Figure 5. Specially, our method shows a high effectiveness and accuracy on the sites of news articles, web mail pages, web notice board, and so on. All of these web sites are very familiar to modern people and we need to access these web sites in everyday life. So our approach is very suitable in the current web circumstances. But to make our methods more applicable in a U-city, we will give our efforts on implementations and evaluations in the field similar to a U-city, although we have confirmed the efficiency by prototyping. Voice C om m and / Icon Text & Speech Browser (V isualp art) Lin ke d D o c u m en ts a re tra n slated to VoiceXM L In d e x 인덱 introduction 스1 product service technology Linked D ocum ent (Auditory Part) Index In d e x <?xm lve rsion= 1.0?> <vxm lversion= 2.0 > <form.. : </vxm l> Figure 5. An example of the operations on a multimodal web browsing system : It provides advantages of a visual and auditory presentation of HTML documents, a suitable viewing on a small screen, and an efficient interface in mobile computing environments. 5. References [1] Baek, S.Y., Ubiquitous City, Buzz That Is Here to Stay, The Korea Times, Oct. 30. 2006. [2] VoiceXML forum, VoiceXML 1.0, http://www.voicexml.org/, 2003. [3] W3C, VoiceXML 2.0, http://www.w3.org/ TR/2001/ WD-voicexml20-20011023/, 2003. [4] Xie, X., Wang, C., Chen, L., Ma, W.Y., An adaptive web page layout structure for small devices, Multimedia Systems, Vol.11, No.6, pp. 34-44, 2005. [5] Hwang, Y., Kim, J., Seo, E., Structure-aware web transcoding for mobile devices, IEEE Internet Conputing, Vol.7, No.5, pp. 14-21, 2003. [6] Hwang, Y., Jung, C., Kim, J., Chung, S., WebAlchemist: A Web Transcoding System for Mobile Web Access in Handheld Devices, SPIE ITCOM, 2001. [7] OPERA, SSR: Small Screen Rendering, http://www.opera.com/products.mobile/ [8] Choi, H.I., Jang, T.G., Design and Implementation of a HTML to VoiceXML Converter, Korea Information Science Society Journal, Vol.7, pp. 559-568, 2001. [9] Asakawa, C., Annotation-Based Transcoding for Nonvisual Web Access, ASSET2000, pp. 172-179, 2000. [10] Goose, S., Newman, M., Schmidt, C., Hue, L., Enhancing Web accessibility via the Vox Portal and a Webhosted dynamic HTML <-> VoxML converter, Computer Networks Journal, Vol.33, pp. 583-592, 2000. [11] Lee, D., Cho, S., Shin, H., Choi, E. Han, D., Design of a Voice Support Browser having Contents Converter, WSEAS Transactions on Circuits and Systems, Issue 2, Vol. 3, pp. 416-419, 2004. Authors bio. Soosun Cho is an assistant professor in Dept. of Computer Science, Chungju National University and her research interests include web document conversion, mobile browser, and web image analysis. HeeSook Shin is a researcher in Wearable Computing Research Team, U-Computing Research Department, IT Convergence Technology Research Laboratory, ETRI and her research interests include multimodal web access and smart interfaces for mobile and wearable computing devices. 27