Scalable Video Streaming in Wireless Mesh Networks for Education LIU Yan WANG Xinheng LIU Caixing 1. School of Engineering, Swansea University, Swansea, UK 2. College of Informatics, South China Agricultural University, Guangzhou, China 465839@swansea.ac.uk; xinheng.wang@swansea.ac.uk; scauliu@21cn.com Abstract In this paper, a video streaming system for education based on wireless mesh network is proposed. Wireless mesh network is a self-organizing, selfmanaging and reliable intelligent network which allows educators to deploy a network quickly whenever there is a need. Video streaming plays an important role in this system for multimedia data transmission. This new system adopts the scalable video coding scheme that enables the video server to deliver layered videos to different user groups. In addition, in order to improve the robustness and efficiency of video streaming, a new approach was developed by transferring the important video control parameters by SDP at the beginning and RTCP during video transferring phase. Furthermore, a quality control method was developed to automatically change the output data rate based on network conditions. Real implementation test results show the proposed methods are effective. 1. Introduction The latest developments in Information and Communication Technologies (ICTs) have serious impacts on quality of education. Radio, television, Internet and satellite have become very popular tools to deliver learner-centered education services to those who are in need, particularly to those who live remotely and are difficult to get access to education resources, or to those who would like to learn at flexible time. Wireless communications, including satellite, mobile communications and WLAN, provide mobility to users and can break the barrier of space to deliver the education resources. These technologies are more attractive to those in remote areas or want to learn anytime and anywhere. WLAN technology, as a low cost, high broadband Internet access, and support of mobile users, is 978-1-4244-3930-0/09/$25.00 2009 IEEE especially appealing nowadays in educational institutions [1]. Most of the universities in western countries are covered by WLAN for teachers and students to get access to the Internet and share resources. However, there are some inherent disadvantages of WLAN technology, i.e. the access points are fixed, the coverage range is short, and the mobility between access points is limited. This hinders the applications of WLAN in some situations where a network is needed temporary and mobile. Recently, a new networking technology has been developed to break the barriers of WLAN. This is wireless mesh network (WMN) [2]. WMN is a selforganizing, self-managing and reliable wireless communication network that provides dynamic topology allowing users to join and leave automatically. It enlarges the coverage of WLAN and provides full mobility for end users. In addition, it can be deployed wherever there is a need by using battery power or solar power for some special occasions, such as outdoor wildlife observations and sport activities. Further to previous successful development of WMN in healthcare [3][4], a WMN based education system has been developed in this paper. This new system focuses on video streaming on WMN to provide multimedia education services. It classifies the end users into different communication groups based on the processing ability of the devices and the conditions of network to achieve optimal transmission results. In addition, a new method is proposed to transmit the video control messages that reduce the overhead of the network for real-time video streaming. The rest of this paper is organized as follows: In Section 2 the wireless mesh network technology is introduced. Section 3 describes the framework design of video streaming system in WMN. In Section 4 the scalable video streaming method is presented. Section 5 describes the methods on how to transmit the video control information. The quality control of SVC is
discussed in Section 6. Section 7 presents the system evaluation. Finally Section 8 concludes the paper. 2. Technology of WMN The technology of wireless mesh network (WMN) was invented in 1990s. Comparing with WLAN, WMN extends the communication coverage with the help of multi-hop routing. A typical wireless mesh network consists of two types of nodes: the mesh routers and the mesh clients. Mesh routers, which form the backbone of the wireless mesh network, have the routing function to support the mesh networking and some of them work as gateway to connect to outside network. Mesh clients are end users in the network. Sometimes mesh clients can also work as mesh routers if they have the ability of routing. Generally, nodes with less mobility and high power will be chosen as mesh routers. The architecture of WMN is illustrated in Figure 1. 3. Framework Design of Video Streaming in WMN To improve the flexibility of video streaming for education, a new system was designed and developed based on WMN as shown in Figure 2. In this system a camera serves as a video streaming server. It is responsible for video capturing, encoding and transmitting real-time video streams. The end user devices, which connect to the server by WMN, to receive the videos can be laptops, PDAs or mobile phones. When end users want to receive the real-time videos, they send the requests to the server. The server returns the transmission information (multicast address and port number) and video parameters, such as Sequence Parameter Set (SPS) and Picture Parameter Set (PPS), to end users by Session Description Protocol (SDP) and then begins to transmit the real-time videos by multicast. After end users receive the necessary information, they can join the multicast group(s) to receive and replay the videos. If any user stops receiving the videos, it leaves the multicast group. If no one in the network receives videos, the server stops sending videos automatically, or it can be switched off manually. Figure 1. Architecture of WMN WMN has several key characteristics: 1. Multi-hop. With this characteristic, node can communicate with other nodes out of sight without enhancing the radio strength, but via forwarding by neighbor nodes. 2. Self-organization, self-management and self-healing. This characteristic makes the mesh network more robust. When any node in the network fails, other nodes could remove routes to it and establish new routes to maintain the network. 3. Clients consume less energy. With the help of mesh router, clients do not need to consume energy to routing. It s an important characteristic when the end users use battery powered devices, such as PDA and smart phones. Figure 2. Video streaming system in WMN Figure 3 illustrates the layered architecture of developed video streaming system. Before transmitting the videos, SDP is used to negotiate between the clients and server. Because the information conveyed by SDP is crucial to the video streaming, TCP protocol is used on the transport layer to make sure it is error free. On server side, real-time videos are captured by Video for Linux (V4L) [5]. Then the captured videos are encoded by H.264/SVC, which supports scalable video coding. Before sending out by multicast, the encoded videos are encapsulated by RTP (Real-time Transport Protocol) to improve the reliability with the help of RTCP (RTP Control Protocol). On client side, after decoding, SDL (Simple DirectMedia Layer) multimedia library is used to replay the videos [6].
Video Server Video Client Application Layer SDP Video Capture (V4L) Video Encoder (H.264/SVC) Multicast Server (RTP and RTCP) Video Display (SDL) Video Decoder (H.264/SVC) Multicast Client (RTP and RTCP) SDP Transport Layer TCP UDP Router UDP TCP Network Layer MAODV MAODV MAODV Figure 3. Layered architecture of video streaming system On network layer, previously developed multicast AODV protocol is used for packet forwarding [3]. Since video transmission is time sensitive, UDP protocol is deployed on transport layer for streaming videos. The details of some important technologies on this video streaming system will be discussed in next section. 4. Scalable Video Streaming H.264/SVC is an extension of H.264/AVC standard. H.264/AVC [7], which stands for ITU-T H.264 / MPEG-4 (Part 10) Advanced Video Coding, is the most recent international video coding standard. Comparing with prior video coding standards, H.264/AVC provides significantly improved coding efficiency. Some new features are also designed to improve the robustness and flexibility for video transmissions over a variety of networks. For instance, the parameter set structure scheme separates the key information [8]. The key information consists of Picture Parameter Set (PPS) and Sequence Parameter Set (SPS), which will be transmitted in a reliable channel to improve the robustness. SVC is used to generate scalable video streams for heterogeneous environments, which include one base layer (BL) and several enhancement layers (EL). Base layer provides the base quality video while the enhancement layers are used to refine the video quality. H.264/SVC standard transmission mode is designed to aggregate multiple layers into a single bit stream before transmitting the video on the network. When the base layer and enhancement layers are generated, the encoder marks every layer and then aggregates all of them into one stream. Receivers will then decode one or more layers in the stream based on their ability, keep the wanted and discard the unwanted, such as depicted in Figure 4. Figure 4. Standard SVC transmission The advantage of this method is that it just needs to open one door on the firewall on receiver side. It is more secure in complex environments. But in private wireless networks, this is not the crucial point especially the network is temporarily constructed. However, it has a big disadvantage that this method could waste lots of bandwidth, which is the most important concern in wireless networks. To address this problem, another transmission method was designed in our system. The goal of developing SVC is to meet different end users request. So the crucial issue is to classify the end user devices. Generally, the criteria to distinguish different devices are processing ability and screen size. Nowadays with the development of powerful CPUs, the small device also has powerful processing ability. Obviously, screen size becomes the most important criterion. Although various screen size can be found in the mobile devices, they can simply be classified into two types: big screen and small screen devices. In this system, the screen size which is big enough to display 640*480 pixels (VGA) is classified as big screen device. Otherwise, it is small screen device. Considering the screen size and the complexity of deploying multicast group, the encoder generates only two layers of video streams, base layer and enhancement layer. They are packed into two multicast groups separately, as shown in Figure 5. The base layer is designed to replay on small screen devices, so the video size is set to 320*240 pixels (QVGA). The enhancement layer is packed into another multicast group and sent to big screen devices. Compared to the standard transmission model, it does not need to aggregate the two layers into one stream. The
aggregation in the encoders is removed. Devices with big screen need to join two multicast groups. However, the device with small screen only joins one multicast group. The enhancement layer does not need to transmit to small screen device, which could save huge bandwidth. the three types was added in parameter a of the SDP session description part, which is underlined in Figure 6. The format is described as following: a = PPS x / y : ( Value) a = SPS x / y : ( Value) a = SPSE x / y : ( Value) Figure 5. Scalable video streaming 5. Transmission of Video Information Video streaming has different characteristics comparing to common data transmissions over networks, so some special protocols have been developed to increase the efficiency of video streaming. SDP [9], which stands for Session Description Protocol, is designed for negotiation between end users in multimedia communications. It does not deliver the media data itself, but provides a standard format to exchange the initialization parameters in an ASCII string. With the help of SDP, the end users can have the necessary information before they really transmit the videos. H.264/SVC has a new feature which separates the video parameters information from the real video data. This information includes Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) which contain crucial information for decoding the video. Traditionally these parameters are transmitted in-band which is unreliable. In this paper, a new method is proposed to transmit the SPS and PPS information during the initialization phase and the streaming phase. In an attempt to reduce the waiting time before playing the video and improve the reliability to transfer the important information, the video parameters information was added to the SDP message. The video parameters information consists of 3 types of packets: SPS, SPS Extension (SPSE) and PPS. Every packet of Figure 6. Structure of SDP x/y means the xth packet out of y packets. Value is the content of the packet in decimal format. For example, there are three PPS packets. If the first one has 4 bytes, the format is show in Figure 7. Figure 7. Structure of video information With the help of this scheme, when the receiver receives the SDP reply from the sender, the video parameters information could be sent to the decoder immediately. So once the decoder receives the stream data, the videos will be decoded as soon as possible. During video transmission, since the video parameters may be changed, the SPS and PPS must be sent again. If they are transferred with the video data, it is difficult to know whether every end user receives it or not. In this situation, the video information was designed to transfer by RTCP Application-Defined (RTCP APP) packet. Video parameters information was added into RTCP APP packet and sent out to receivers. To ensure the successfully delivery of RTCP APP packet, RTCP APP packet is sent twice within certain time, T, that receivers start to use the new information of video parameters. T is calculated as:
T = Ti 2 ( if T > T ) i s (1) Ts 2 ( if Ti < Ts ) where T i is the time interval between first and second RTCP APP packet sending time. T s is the round trip time of packet travelling from server to receiver. T i can be calculated by Ti = [ Sa N r + ( Sa + Ss ) N s ]/ B (2) where S a is the average size of the RTCP packet, S s is the size of RTCP APP packet, N r and N s stand for the number of the receivers and senders, respectively, and B is the bandwidth for RTCP transmission. At time T, two Application-Defined RTCP Packets were sent. The contents include the SPS and PPS information, the sequence number which updated for new PPS and SPS information, and time T. When the receiver receives this RTCP packet, a feedback RTCP APP packet will be sent to video server. Video server will compare the number of received feedbacks with the quantity of the receivers. If some end users did not receive the packet, the server will send it again until all the receivers receive the packet. 6. Quality Control of SVC in WMN Since the condition of networks has significant impact on the quality of the video, the condition is also considered as a factor as well as screen size to decide which layer should be delivered to the receiver. The condition of network is collected by RTCP. Periodically, the video server can receive the Receiver Report (RR) from RTCP, whilst end user can receive the Sender Report (SR). From the SR and RR results, packet loss, jitter and delay can be calculated. In this system, packet lost is chosen as a reference factor to control the video quality. Previous experimental results show when the packet loss is below 4%, the video quality is acceptable. If higher than 7%, the decoded video is almost damaged. If lower than 0.5%, the video has little quality degradation. Based on these results, the minimum and maximum packet loss limitations are set in the video server as well as in end users. If the packet loss exceeds the limitation, the Quantization parameter (Qp), which controls the output bit stream in the encoder, will be adjusted. At the same time, every end user will decide to accept or reject to receive the video layer(s). This approach is illustrated in Figure 8. With this method, the server and end users can control the video quality automatically together, which not only save the bandwidth, but also improve the video robustness. Figure 8. Quality control algorithm 7. System Evaluation and Results To evaluate the effectiveness and performance of proposed video streaming system, the new approaches were implemented and tested in a previously developed WMN testbed [4]. Experiments have been done in a home environment as shown in Figure 9. A camera and a computer work together as a video server where all the server algorithms are implemented in the computer. Three mesh routers are set in three different rooms. Video server sends one base layer (BL) and one enhancement layer (EL) simultaneously by two multicast group. One end user with small screen, a PDA, receives base layer. The other one, laptop, with big screen receives two layers. The distance among the devices can be adjusted to achieve diverse network conditions. Figure 9. Schematic illustration of experiments To test the effectiveness of quality control method, three working scenarios were recorded, as shown in Figures 10-12. In the Figures, GOP stands for group of picture, which consists of four frames. After sending one GOP, video server receives the RR report. In Figure 10, since the packets loss is less than 0.5%, the server increases the output bit stream. The
trend of two lines goes up. So every receiver can receive the good quality videos. Bit Stream(KB/GOP) 80 70 60 50 40 30 20 10 BL (multicast group 1) EL (multicast group 2) From the test results, it is obvious that the system has a good control of quality of service under any network conditions. With the help of the wireless mesh network technology and proposed scalable video streaming scheme, this system can deliver flexible and diverse videos to the end users, which makes the education easier than general video transmission system. In addition, the improvement of quality control in the system overcomes the unsteady problem of wireless networks. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 GOP Figure. 10 Bit stream versus GOP with packet loss lower than 0.5% When in a usual network condition as shown in Figure 11, at the beginning, the packets loss of the EL is more than 4%, so the server reduces the EL output stream until the packets loss is less than 4%. In this Figure, the output bit streams of the two layer change frequently based on the network condition. At GOP 14, because of the huge motion captured in the picture, the bit stream increases unexpected. 45 40 BL (multicast group 1) EL (multicast group 2) 8. Conclusion A scalable video streaming system in wireless mesh network was designed in this paper. To improve the robustness of video transmission, the video control messages were transferred by improved SDP protocol at the beginning of transmission phase and the updated video information were transmitted by RTCP during video transferring phase. In addition, based on the network condition, the output video bit streams can be adjusted automatically by the server and end users can accept or reject to receive relevant layers of videos. From the test result, the system is proved to provide good performance real-time videos in wireless mesh network. Bit Stream(KB/GOP) 35 30 25 20 15 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 28 GOP Figure. 11 Bit stream versus GOP with low packet loss In Figure 12, with the bad network condition, the EL is dropped by the end user to reduce the bit stream in the network, so as to ensure BL can be received in a steady and acceptable quality. Bit Stream(KB/GOP) 60 50 40 30 20 10 BL (multicast group 1) EL (multicast group 2) 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 28 GOP Figure. 12 Bit stream versus GOP with high packet loss 10. References [1] L. Moody, G. Schmidt, Going wireless: the emergence of wireless networks in education, J. of Computing Sciences in Colleges, Vol. 19, No.4, 2004, pp.151-158. [2] X. H. Wang, Wireless Mesh Networks, J. of Telemed. and Telecare, Vol.14, No.8, 2008, pp.401-403,. [3] M. Iqbal, X. H. Wang, D. Wertheim, and X. Zhou, SwanMesh: A Multicast enabled Dual-Radio Wireless Mesh Network for Emergency and disaster recovery Services, J. of Communications, 2009, Vol.4, Issue 5.. [4] X. H. Wang, M. Iqbal, and X. Zhou, Design and Development of a Dual Radio Wireless Mesh Network for Healthcare, 5 th Intern. Conf. Information Tech. and App. in Biomedicine (ITAB 2008), 30-31 May, 2008, Shenzhen, China, pp. 300 304. [5] http://linux.bytesex.org/v4l2/ [6] http://www.libsdl.org/ [7] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, Redmond, WA, July 2004, JVT-L050. [8] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on circuits and system for video technology, Vol.13, No.7, 2003, pp.560-576. [9] M. Handley, V. Jacobson, C. Perkins. "SDP: Session Description Protocol (RFC 4566)". IETF. 2006-07 [10] http://tools.ietf.org/html/rfc4566