An Advanced Simulation Tool-set for Video Transmission Performance Evaluation

An Advanced Simulation Tool-set for Video Transmission Performance Evaluation Ku-Lan Kao Institute of Computer and Communication Engineering, Dept. of Electrical Engineering, National Cheng Kung University No.1, University Road, Tainan City, 70101, Taiwan kulan@hpds.ee.ncku.edu.tw Chih-Heng Ke Institute of Computer and Communication Engineering, Dept. of Electrical Engineering, National Cheng Kung University No.1, University Road, Tainan City, 70101, Taiwan smallko@ee.ncku.edu.tw Ce-Kuen Shieh Institute of Computer and Communication Engineering, Dept. of Electrical Engineering, National Cheng Kung University No.1, University Road, Tainan City 70101, Taiwan shieh@ee.ncku.edu.tw ABSTRACT Presenting an advanced simulation tool-set for video transmission performance evaluation is the objective of this paper. This tool-set integrates NS-2 with Evalvid plus a new coding mechanism: Multiple Description Coding (MDC). Using the integrated NS-2 simulator, researchers can easily evaluate specific transmission performance of a variety of videos varying in codec or coding mechanisms. Researchers can analyze their own design, such as their own improved network protocols or QoS control schemes, in a realistic simulation environment without delving into the simulator itself. Besides, by using the tool-set, researchers can acquire video transmission quality not only with former evaluation metrics, but also with real recombined video, which can be played out and observed. In conclusion, researchers who utilize our evolved simulation tool-set will benefit from speedily verifying their video transmission over wireless network designs. Categories and Subject Descriptors C.2.1 [Computer-Communication Networks]: Network Architecture and Design wireless communication. General Terms Measurement, Performance, Design, Experimentation. Keywords Simulation Tool, Performance Evaluation, Video Transmission, Multimedia Communication, Wireless Network. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WNS2, October 10, 2006, Pisa, Italy. Copyright 2006 ACM 1-59593-508-8/06/10 $5.00 1. INTRODUCTION Due to recent convenience and universal nature of the Internet, more and more users choose to communicate with portable components, such as laptop, mobile phone, and PDA. Namely, the demand for Internet multimedia communication service increases daily. Publications and research papers are increasingly focused on this topic as a result. Unfortunately, not all network researchers are multi-faceted in cross-layer design. Some might be specialized in multimedia coding but not familiar with underlying networks, while others might have trouble experimenting with their new idea or theory. Finding a powerful and easy to use tool is most important to researchers. Therefore, we developed a superior simulation tool-set which is helpful for network researchers. Previous high-quality video transmission studies [1][2][3] when evaluating video delivered quality, have adopted VBR/CBR traffic flows or H.263/H.264/MPEG4 video trace files, rather than real video as the source video stream source in their simulation environment. These traces provide only network-level analysis such as throughput, delay, jitter, and loss-rate. These metrics reflect network states, but may be insufficient to adequately rate end user perceived quality. The only advantage is that researchers do not need to understand video encoding and decoding details. The trade-off in the meantime is that researchers cannot adjust video coding parameters nor apply other specific coding mechanisms to the same video source. In this paper, we present an evolved simulation tool-set which integrates EvalVid [4] into NS-2 [5], plus the addition of multiple description coding (MDC) [6], making this tool-set more useful while measuring video delivered quality. We enhanced some video transmission interfaces over wireless networks into EvalVid to improve the simulation model. With enhancement, the tool-set enables network-related researchers to evaluate real video streams on their proposed network designs or protocols, and video-related researchers can also evaluate video quality of their designed video coding mechanisms using a more realistic network. This proposed tool-set is publicly released [7]. Moreover, one evolution is that our new tool-set adopts both raw YUV video files [8] and downloadable video traffic trace files [9] as a video stream simulation source. That is, researchers can devote themselves to their primary research area without

bothering with the underlying trivial, and have a more comprehensive available video source to certify their simulation reliability. According to an old saying, seeing is believing ; our tool-set recombines received video frames into real videos which play out before our eyes. Furthermore, although PSNR, which compares each pixel in the original and distorted frame to get its value, is the most common metric used to measure end user perceived quality, it takes a very long time to compute. We alternatively use a decodable frame rate (Q) to measure video quality at the receiver, which takes less time compared to PSNR. Therefore, in this paper, we use decodable frame rate as one of our main video evaluation metrics. The remainder of this paper is organized as follows: In section 2, EvalVid overview and some common performance metrics are introduced. Section 3 describes enhancement interfaces between NS-2 and EvalVid and our new tool myevalvid-nt. We bring multiple description coding (MDC) into our simulation tool-set in section 4. Then we show two examples demonstrating our tool-set usefulness. Finally, we summarize the paper and present our future work. 2. RELATED WORKS EvalVid is an existing tool for evaluating video quality transmitted over a real or simulated communication network. EvalVid assists researchers in evaluating their own designed network mechanisms or protocols over a real or simulated network by presenting user-end perceivable video. Following is a brief introduction of the original Evalvid. 2.1 Overview of Evalvid The structure and main components of the Evalvid evaluation framework are described as follows: Source: There are two formats of video source, YUV QCIF (176 x 144) and YUV CIF (352 x 288). Video Encoder and Video Decoder: EvalVid supports two MPEG4 codecs presently, the NCTU codec [10] and ffmpeg [11]. VS (Video Sender): This component reads the compressed video file produced by the video encoder, fragments larger video frame into smaller segments, and then transmits these segments over a real or simulated network. For each transmitted packet, the timestamp, packet id, and packet payload size are recorded in the sender trace file. After that, this component also generates a video trace file containing information about each frame in a real video file. The video trace file and sender trace file can be used for video quality evaluation. ET (Evaluate Trace): Evaluation takes place at the sender side once video transmission is over. Namely, information about the timestamp, packet id, and packet payload size available at the receiver must be transported back to the sender. Based on the original encoded video file, video trace file, sender trace file, and receiver trace file, this ET component can report frame/packet loss and frame/packet jitter and generate a reconstructed video file corresponding to the possible reproduced video found at the end user side. Furthermore, this component considers a frame lost if the frame arrives later than its predefined playback time. Figure 1. The framework of Evalvid tool-set.

Figure 2. Interfaces between NS-2 and Evalvid. FV (Fix Video): Video quality measurement is performed frame by frame. Consequently, the total number of receiver side video frames and erroneous frames included must be the same as the original sender side video. If the codec cannot deal with missing frames, the FV component is an error concealment technique used to handle this problem by inserting the last successfully decoded frame in the place of each lost frame [12]. PSNR (Peak Signal Noise Ratio): PSNR is one of the most widespread objective metrics for assessing application-level QoS of video transmissions. The following equation shows the definition of the PSNR between the luminance component Y of source image S and destination image D: where Vpeak = 2k-1 and k = number of bits per pixel (luminance component). PSNR measures the error between a reconstructed image and the original one. MOS (Mean Opinion Score): MOS is a subjective metric to measure application level video quality. This metric of human quality impression is usually given on a scale ranging from 1 to 5, equaling worst to the best. However, the EvalVid simulated network environment is too simple to support realistic and complex network scenarios for evaluating video transmission quality, thus the simulation result is not very credible due to its oversimplified and poor functionality. Therefore, we evolved the original Evalvid into an enhanced version and developed another simulation tool-set myevalvid- NT for evaluating video transmission quality. 3. Enhanced Evalvid and myevalvid-nt As mentioned above, Evalvid adopts overly simple models and functions, such as using a basic error model to represent corrupted or lost packets in the real network, as an accurate and reliable simulator. Most performance metrics used in Evalvid experiments are also network-level measures. So we modified and added to the original NS-2 files, integrated Evalvid into NS-2, and extended some additional functions. We take the unequal significance of I, P, B frames into consideration in our work, and assign them different priorities for later use in decodable frame rate calculation. This proposed tool-set and modules are all publicly available at [13] with detailed descriptions, and several papers are presently based on this toolset [14][15][16][17][18]. In part one of this section, we introduce the enhanced Evalvid architecture, some new agents we expand in NS-2, as well as an application-level performance metrics- decodable frame rate (Q). In part two, we introduce myevalvid-nt.

Figure 3. The architecture of myevalvid-nt. 3.1 New network simulation agents The following figure illustrates the video traffic QoS assessment framework enabled by the new tool-set that combines EvalVid with NS-2. Three connecting simulation agents, MyTrafficTrace, MyUDP, and MyUDPSink, are implemented between NS-2 and EvalVid. These interfaces are designed either to read the video trace file or to generate the data required to evaluate the video delivered quality. MyTrafficTrace: This agent primarily extracts frame type and size from the video trace file generated by the EvalVid VS component output. This agent also fragments video frames into smaller segments and sends them to the lower UDP layer at the appropriate time according to user settings specified in the simulation script file. MyUDP: This agent, essentially an extension of the UDP agent, allows users to specify the output file name of the sender trace file and records each transmitted packet timestamp, plus packet id, and packet payload size. MyUDPSink: This is the receiving agent for the fragmented video frame packets sent by MyUDP. It also records timestamp, packet id, and payload size of each received packet in the user specified file. Decodable Frame Rate (Q): Standard MPEG encoders generate three distinct types of frames, namely I, P and B frames. Due to the MPEG hierarchical structure, I frames are more important than P frames, and in turn P frames are more important than B frames. By definition, the frame is considered decodable only when at least a fraction decodable threshold (dt) of the frame data is received, is the frame considered decodable. Therefore, a frame is decodable if and when all fragmented packets of this frame and other packets that this frame depends on, are completely received and are decodable. Thus, decodable frame rate (Q) is defined as the number of decodable frames over the total number of frames sent by a video source. 3.2 myevalvid-nt (Network Trace) Previous studies often use publicly available real raw video traces to evaluate their proposed network mechanisms in a simulation environment. Nevertheless, many recent studies have adopted website traffic traces [9], which provide a publicly available frame size trace library of long MPEG-4 and H.263 encoded videos. These traffic traces are much longer, typically 60 minutes each, and frame size traces are generated from several video sequences. Realistically, no tool-set is publicly available at present to perform a comprehensive video delivered quality evaluation using these traffic traces in a network simulation environment. Consequently, referring to our enhanced Evalvid system, we develop another myevalvid-nt version based on NS-2 [19]. Due to the incompatible Application Programming Interface (API), some old traffic traces can only be applied in Linux and not with Cygwin. As a result, we proposed a new application sink agent, myevalvid_sink, to cover this problem; myevalvid_sink acts like myudpsink, and only differs in the API of input traffic trace file.

Figure 4. Architecture of MDC framework 4. MULTIPLE DESCRIPTION CODING A recently attractive way to enhance communication system reliability is to use multiple description coding (MDC) [6] at the source coder. In MDC, several same-signal coded bit-streams are generated, and each stand-alone stream is decodable independently. A better signal reproduction is achieved with more descriptions received, but decoded signal quality is acceptable even with a single description. That is, when only some of the streams are received, reconstruction quality degrades gracefully, which is very unlikely to happen with a system designed purely for compression, making MDC an attractive issue at present. We successfully integrated the MDC mechanism into our toolset [20]. The MDC video transmission evaluation framework and the main evaluation framework components are described as follows: Raw Video Sequence: The video source can be either in the YUV QCIF (176 x 144) or in the YUV CIF (352 x 288) formats. Splitter: In this framework, a frame-based approach is chosen to split the video into multiple descriptors. The splitter program takes the raw video sequences and splits them into i sub-streams such that the n-th sub-stream contains picture n, 2n, 3n, and so on. Parser: The parser program reads each compressed video substream from the video encoder output and generates a traffic trace file that contains frame id, frame type, frame size, and designated sending time. Evaluate Trace: After simulation, count the number of records in the sender trace file and the receiver trace file will indicate how many packets are sent and received so packet loss rate can be easily calculated. Also, end-to-end packet delay is obtained by subtracting sending time from receiving time. A distorted video file corresponding to the possibly corrupted video found at the receiver side can also be produced. Generation of the possibly corrupted video is a process of copying the original compressed video file, packet by packet, and omitting packets lost or corrupted during transmission. Merger: After decoding each received reconstructed video file, the decoded distorted video sequences are fed into the merger program to generate the reconstructed raw video sequence. Because digital video quality assessment, such as PSNR, is performed frame by frame, the total number of video frames in the reconstructed raw video sequence must be the same as the original video. If some sub-streams are lost, the merger program applies simple error concealment by copying the last successfully decoded frame to the lost frames until a correct decoded frame is found. In our tool-set, there are two ways to prepare for MDC video transmission simulation. One is to begin encoding the raw video into the coded video stream file, and the other is to use the prepared MDC traffic traces, which can be downloaded from [21]. The advantage of the latter method is that network researchers do not need to know how to encode the video in MDC mode in detail. On the contrary, it is harder to study the proposed network mechanism effects on different characteristics of the same video extensively because encoding settings for the publicly available video traffic traces are limited.

5. CASE STUDY In this section, we provide an example to demonstrate the usefulness of this evaluation framework. We used packet error rate to study the quality of video transmission over an erroneous wireless network. 5.1 Simulation topology The simulation topology used in this experiment is shown in figure 5. The video server transmits video streams over the Internet and wireless links to the video receiver. The maximum transmission packet size is 1024 bytes. The link between the wireless access point and the video receiver is IEEE 802.11b 11Mbps. For simplicity, we assume that the link between the video server and the wireless access point has a 10Mbps bandwidth and 10 ms latency. We also assume that no packet loss occurs in the wired segment of the video delivered path. Figure 5. Simulation topology. 5.2 Simulation result In this experiment, the video trace files we used are akiyo_qcif.yuv and news_qcif.yuv which are publicly available on the website [8]. First, we encode akiyo_qcif.yuv and news_qcif.yuv into the MPEG4 video trace file as in table 1, both composed of 300 frames, including 34 I frames, 67 P frames, and 199 B frames. Each video frame is segmented into small packets in transmission; for video akiyo, total packets of I frames are 236, P frames are 100, and B frames are 199, and for video news, the total packets of I frames are 240, P frames are 105, and B frames are 199. Under the same simulation environment, we will vary the packet error rate and compare the transmitted video qualities of these two video files. The packet error rate is set from 0 to 0.2 with 0.01 intervals in the simulation. As shown in figure 6 and figure 7, I, B, P decodable frame numbers of the video flow decreased as expected when packet error rate increased in the examples. In video akiyo, when error rate is 0.01, total packets sent are 535, total packets received at myevalvid_sink are 527, decodable frames are 242, including 29 I frames, 57 P frames, and 156 B frames. Therefore, the Figure 6. I/P/B decodable frame number versus packet error rate of video akiyo. Figure 7. I/P/B decodable frame number versus packet error rate of video news. decodable frame rate (Q) is 0.809365 when packet error rate is 0.01, as shown in figure 8. When error rate is 0.09, decodable frames are 118, including 19 I frames, 31 P frames, and 68 B frames. Therefore, Q is 0.394649.

Figure 8. decodable frame rate versus packet error rate. Figure 9. PSNR versus packet error rate. Additionally, although their decodable frame rate looks alike, their PSNR differ very much. Video stream PSNR decreased with increased packet error rate as shown in figure 9. When packet error is 0.01, PSNR is 43.249249 in video akiyo and 37.92323 in video news. When packet error rate is 0.09, PSNR is 38.933605 in akiyo and 30.03439 in video news. When packet error is 0.19, PSNR is 30.3899 in video akiyo and 27.2702 in video news Transmission performance information was obtained following the simulation, but real video quality is still unknown. In order to confirm that when decodable frame rate and PSNR are larger, video delivered quality is better; our tool-set restores these video sequences into real video. After recomposing sequential frames into YUV video streams, we can play these videos using the YUVviewer application, and observing video quality. Figure 10 is the 70, 180, 250, and 299 frames of reconstructed YUV video akiyo and reconstructed YUV video news at a packet error rate that equals 0.01 and 0.19. We find by careful comparison that the motionless part quality is acceptable, but the background moving part is seriously distorted. This is out of the question to be discovered only by the numeric information. We conclude from verifying different aspects with our tool-set that video delivered quality is better when decodable frame rate and PSNR are larger. In our tool-set, I, B, P frames have different priorities and accumulate separately for later analysis. It is attractive and convenient for researchers studying new video coding, fixing mechanisms, or new transmission, routing protocols regarding I, B, P frame priority. 6. CONCLUSIONS A realistic simulation tool-set, which integrates NS-2 with EvalVid, and an additional common use mechanism MDC is presented in this paper. We enhanced new video and wireless transmission interfaces to support video transmission over wireless network evaluation. Tool-set enhancement provides both network-related and video-related researchers easy evaluation of video delivered quality of their designs in a simulated environment. We calculated decodable frame rate and PSNR for performance evaluation, so that researchers using the tool-set can assess video quality not only with evaluation metrics, but also with real video sequences. According to statistics and our mailing lists, there are currently several projects and papers based on our tool-set. For instance, some focus on QoS Support and some on mobile network multimedia communication. Researcher appreciation convinces us that our tool-set is making a significant contribution. We are also reminded from different opinions that we still have room for future progress. In brief, our proposed framework is a good choice for researchers wanting to verify their designs in multimedia communication, such as network protocols or video coding algorithms. We believe that researchers can make efficient and persuasive progress with this tool-set. Figure 10. Reconstructed YUV video with packet error rate 0.01 and 0.19.

7. REFERENCES [1] Patrick Seeling, Martin Reisslein, and Beshan Kulapala, "Network Performance Evaluation Using Frame Size and Quality Traces of Single-Layer and Two-Layer Video: A Tutorial", IEEE Communications Surveys and Tutorials, Vol. 6, No. 2, Pages 58-78, Third Quarter 2004. [2] O. Rose, Statistical properties of MPEG video traffic and their impact on traffic modeling in ATM systems, Report No. 101, Institute of Computer Science, University of Wurzberg, February 1995. [3] Frank H.P. Fitzek, and Martin Reisslein, "MPEG-4 and H.263 Video Traces for Network Performance Evaluation", IEEE Network Vol. 15, No. 6, pages 40-54, November/December 2001 [4] J. Klaue, B. Rathke, and A. Wolisz, "EvalVid A Framework for Video Transmission and Quality Evaluation", In Proc. of the 13th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, Urbana, Illinois, USA, September 2003. [5] NS, http://www.isi.edu/nsnam/ns/. [6] Frank H.P. Fitzek, B. Can, R. Prasad, and M. Katz, Overhead and Quality Measurements for Multiple Description Coding for Video Services, Wireless. Personal Multimedia Communications (WPMC), September 2004 [7] http://hpds.ee.ncku.edu.tw/~smallko/ns2/evalvid_in_ns2.ht m. [8] http://www.tkn.tu-berlin.de/research/evalvid/ [9] http://www.tkn.tu-berlin.de/research/trace/trace.html. [10] NCTU codec, http://megaera.ee.nctu.edu.tw/mpeg/. [11] ffmpeg, http://ffmpeg.sourceforge.net/index.php. [12] Y. Wand, Q.-F. Zhu, Error control and concealment for video communication: A review, Proceedings of the IEEE, vol. 86, no. 5, pp.974-997, May 1998. [13] http://hpds.ee.ncku.edu.tw/~smallko/ns2/evalvid_in_ns2.ht m [14] Chih-Heng Ke, Cheng-Han Lin, Ce-Kuen Shieh, Wen- Shyang Hwang, A Novel Realistic Simulation Tool for Video Transmission over Wireless Network, The IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC2006), June 5-7, 2006, Taichung, Taiwan [15] Chih-Heng Ke, Ce-Kuen Shieh, Wen-Shyang Hwang, Artur Ziviani, "A Two Markers System for Improved MPEG Video Delivery in a DiffServ Network", IEEE Communications Letters, IEEE Press, ISSN: 1089-7798, vol. 9, no. 4, pp. 381-383, April 2005 [16] J. Naoum-Sawaya, B. Ghaddar, S. Khawam, H. Safa, H. Artail, and Z. Dawy, "Adaptive Approach for QoS Support in IEEE 802.11e Wireless LAN," IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2005), Montreal, Canada, August 2005 [17] H. Huang, J. Ou, and D. Zhang, Efficient Multimedia Transmission in Mobile Network by using PR-SCTP, Communications and Computer Networks (CCN 2005), Marina del Rey, USA, 10/24/2005-10/26/2005. [18] A. Lo, G. Heijenk, I. Niemegeers, "Performance Evaluation of MPEG-4 Video Streaming over UMTS Networks using an Integrated Tool Environment", Proceedings SPECTS 2005, 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems, Philadelphia, PA, USA, July 24-28, 2005. [19] http://hpds.ee.ncku.edu.tw/~smallko/ns2/myevalvidnt.htm [20] http://hpds.ee.ncku.edu.tw/~smallko/ns2/mdc.htm [21] http://trace.eas.asu.edu/mdc/index.htm