A QoE Based Video Adaptation Algorithm for Video Conference

Journal of Computational Information Systems 10: 24 (2014) 10747 10754 Available at http://www.jofcis.com A QoE Based Video Adaptation Algorithm for Video Conference Jianfeng DENG 1,2,, Ling ZHANG 1 1 School of Computer Science & Engineering, South China University of Technology, Guangzhou 510641, China 2 College of Computer Science & Information Technology, Guangxi Normal University, Guilin 541004 China Abstract Video adaptation is an efficient method for transferring video stream under a heterogeneous environment. This paper focuses on video adaptation using the Scalable Video Extension of H.264/AVC. We propose a QoE based video adaptation algorithm. The algorithm uses a Pseudo-Subjective Quality Assessment (PSQA) method build a QoE predict model to assessment the quality of the adapted video stream according to the three SVC scalability parameters (special, temporal, and quality scalability). Then, based on the assessment value, the video is adapted such that the QoE is maximized given the resource constraint. Experimental results demonstrate that the proposed video adaptation algorithm significantly outperforms the conventional video adaptation algorithm in terms of QoE of the adapted video. Keywords: Scalable Video Extension; QoE; Adaptation; PSQA 1 Introduction Video applications are increasingly popular on the network recent years. Many real-time video applications, such as video conference, remote education, etc. are wildly used on the network. The video application devices have different display screen sizes and processing capabilities, and the ways that devices access the network are also different. On the other hand, transferring real time video content is very different from other Internet applications which the IP is originally designed for, especially in the best-effort environment. Therefore, transferring videos among different catalog of terminals through a heterogeneous network while maintaining the video quality remain a big challenge. This work is supported by large-scare high definition video conference system application demonstration of China Next Generation Internet (Grant No. CNGI2008-118), and The Open Fund of Communication & Computer Network Lab of GuangDong (Grant No. 201108). Corresponding author. Email address: deng.jf@mail.scut.edu.cn (Jianfeng DENG). 1553 9105 / Copyright 2014 Binary Information Press DOI: 10.12733/jcis12758 December 15, 2014

10748 J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 Video adaptation [1, 2] is an effective method to deal with this challenge. Video adaptation transfers an input video to an output one by utilizing a set of coding and quality evaluation algorithm based on the environment of the video stream. SVC [3] standard can provide spatial, temporal, and quality scalabilities. For the spatial scalability, video stream has a multi-layer structure. The stream has a base layer and may has several extend layers. The base layer corresponds to the base resolution, and extend layers can extend to higher resolutions, and the Extended Spatial Scalability (ESS) of SVC can crop the video size. The temporal scalability can extract the frame of video to different frame rate, and the quality extend layer has different Quantization parameter (QP) value of the video and the Medium Grain Scalability MGS can extract different video stream packets to obtain more dedicate quality. The SVC allows on-the-fly adaptation to meet different requirements such as display resolution, processing capabilities of terminal devices and network connections. Therefore, it is suitable for video adaptation. In order to evaluate the quality of video stream, many quality assessment methods were proposed. The traditional objective quality measure metrics such as Peak Signal to Noise Ratio (PSNR) are often used for evaluating SNR-scaled video stream. However it is hard to compare the qualities of videos with different resolutions and frame rates, and it was shown in [4, 5] that the objective assessment does not necessarily correlate to humans perception. The concept of Quality of Experience (QoE) [6] is proposed for evaluating the humans perception of the quality of service. QoE is directly measured as the overall acceptability of a service by human. However, in the subjective evaluation, a group of human judgers are needed to evaluate the subjective quality of a video sequence. This is very labor-intensive, and is hard to be adopted in real-time systems. To address these issues, Pseudo-Subjective Quality Assessment (PSQA) [7] methods were proposed. PSQA methods attempt to use a set of subjective assessment data to build a QoE prediction model, and use that model to predict the QoE automatically in real-time. Many QoE based adaptation algorithms are proposed in literature. An adaptation algorithm proposed in [8] try to dynamically adapt scalable video to a suitable three dimension scalability (spatial, temporal and quality) combination. The paper constructs an five dimensional space (Encoder Type, Video Contents, Bit rate, Frame Rate and Frame size), and through subjective tests, the paper summarizes a set of rules used by the adaptation algorithm. The video adaptation process has several step, in each step a set of rule are used to control the adaptation process, the rule is very intuitively, such as if the current jerkiness is very high, increase the frame rate. A non-intrusive QoE prediction model for low bit rate and resolution videos is proposed in [9]. The paper consider four important parameters: content type, sender bit rate, block error rate, and mean burst length, from subjective experiments. It summarizes a set of rules, and then proposes an adaptation algorithm using fuzzy-logic techniques to perform the rule. In [10], researchers propose a framework for video conference adaptation based on QoE. The paper perform a large amount of subjective tests to study the relation between the main influential video parameters and the quality experienced by end users, and summarize a set of QoE adaptation rules. The available bandwidth of video conference is calculated according to the RTCP protocol. With the available bandwidth and the adaptation rules, the paper proposes an adaptation mechanism for video conference. In [11], researchers investigates the QoE-optimized rate tailoring. In the paper it propose a concept of track : i.e., it defines a point as the combination of the three scalability parameters, and the track is a collection of points that are corresponding to each coding rate with highest QoE. From the subjective test, it is possible to get the individual videos track and the paper propose a search method through minimal mean absolute difference to find a common

J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 10749 track, which is then used for a group of videos. Those QoE based SVC adaptation usually only propose some guidelines to optimize the adaptation. However, the QoE is complicate and not easy to describe with few guidelines. Therefore, these adaptation algorithms only provide coarse adaptation mechanisms. The PSQA quality assessment methods such as the artificial neural networks (ANN) based QoE assessment method have demonstrated decent accuracy [7, 12]. To the best of our knowledge, there is no PSQA based video adaptation system reported in literature. In this work, we consider using the ANN base QoE assessment as the evaluation standard to adapt SVC video stream under the constraint of network bandwidth. A video database was built to training a QoE model, the model is used to assess the quality of video. Then the optimal video adaptation problem can be translated to a constrained optimization problem, where the constraint is the bandwidth. The effective function, which is to be maximized, is the quality of the adapted video given the three SVC scalability parameters. We use pattern search method to find an optimal parameter combination under the bandwidth constrain, which gives the best quality according to the QoE model. Experiment shows that the proposed algorithm is effective for SVC video adaptation, significantly outperforms the conventional video adaptation algorithm in terms of QoE of the adapted video. The rest of paper is organized as follows. Section II discusses a PSQA video quality evaluate model for H.264 SVC video stream. The adaptation algorithm is discussed in Section III, Section IV present the experiments and analysis. Section V is the conclusion. 2 PSQA Based Video Quality Evaluation In this section, we present the development of a pseudo subjective quality assessment video quality evaluate model for H.264 SVC video stream. 2.1 Video sequence generation SVC supports three kinds of scalability: temporal scalability, spatial scalability and quality scalability. It codes the video only once and can extract different sub-streams which contain different sets of scalability layers. This feature is especially useful for the video application like video conference. E.g., in the video conference, one video may be request by the user that may have different network conditions and terminal device, the SVC can use only one encoding process to satisfy all the requests. The video sequence data collection is divided to two steps: the first step is encoding the video sequence; the second step is extracting sub video streams. In order to build an ANN-based QoE assessment model, a group of video sequences covering the conditions that used in the video application are needed. Here we chose the video resolution points: QCIF, CIF, 4CIF; the frame rate: 1.875, 3.75, 7.5, 15, 30pfs; and the quality parameter (QP value in JSVM [13]): 28, 32, 36, and 40. The videos were encoded with JSVM 9.19.14 codec as it is the reference software for the standard scalable video extension of H.264/AVC. For the temporal scalability, 5 layers are chosen according to the frame rates selected above. For the quality scalability, mediumgrain scalable layers (MGS) is used, which can provide enough layers for the model. Since we focus on the real-time stream, so the configure parameter MaxDelay is set to zero. The three scalability parameter is set to the highest parameters so as to extract all the sub streams from

10750 J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 the encoded video. After encoding, the codec produces a video file. Using the extract tool provided by JSVM, we can extract sub streams from the video stream based on the three parameters. Altogether, a total of 60 (3*5*4) pieces of video stream are collected. We use this video data set for the subjective test. 2.2 Subjective test The subjective assessment of video quality was carried out following ITU-T P.910 DCR. The length of each video clip is about 10 seconds, and the judger is asked to rate the video quality by a MOS score within six seconds. The MOS score is on a 9-level scale, 1 to 9, where 9 means the video quality is excellent and 1 means the video quality is bad. Each video sequence is rated by five observers. A total of 27 human observers participated in this video quality subjective assessment, all of them are college students, and their age is between 18 and 22. For each video, the subjective test result are used to calculate a mean score u s,t,q : u s,t,q = 1 N N u s,t,q,i (1) where u s,t,q,i is the score give by observer i about sequence video s,t,q, s, t and q represent the special, temporal, and quality sequence number respectively. N is the number of observers. All mean scores obtained are associated with a confidence interval which is derived from the standard deviation. We used a 95% confidence interval which is given by [u s,t,q δ s,t,q, u s,t,q + δ s,t,q ], where i=1 δ s,t,q = 1.96 sd s,t,q N (2) And the standard deviation of each video is calculated by sd s,t,q = N (u s,t,q u s,t,q,i ) 2 (N 1) i=1 (3) After calculating the score, each video sequence will be assigned by only one MOS score. The data are used to training the ANN model. 2.3 The ANN based QoE assessment model Following [7, 12], we use the Artificial Neural Network to build a quality assessment model. The ANN architecture we used is shown in Fig. 1. The SVC codec standard provides three dimensions of scalability parameters. We use the three scalability parameters as the input of the ANN model. The number of nodes in the hidden layer is tuned on a held-out set, i.e., we tested several settings from 3 to 20 on a held-out data in cross-validation, and the setting of having six hidden nodes gives the best performance, which is then used across all experiments in this paper. The output layer has only one node, the output is the MOS score corresponding to the three scalability parameters combination. In our experiments, the methods are implemented based on Matlab. After training, the ANN model can be used to evaluate the SVC stream QoE by the three scalability parameters.

J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 10751 Fig. 1: Artificial neural network 3 QoE Based Scalable Video Parameter Optimize In this section we propose an algorithm that can optimize the scalability parameter when transiting the video under the constraint of the network bandwidth. We use ANN based video assessment model to evaluate the quality of video which is encoded by the three scalability parameters. The optimization problem can be described as follows: (s, t, q) = argmax (s,t,q) U(s, t, q) while R(s, t, q) B max (4) Here (s, t, q) is the scalability parameters: resolution, frame rate and quality. The ANN assessment model serves as the effectiveness function U(), where its inputs are s, t and q, and its output is the estimated MOS score U(s, t, q). Function R is proposed in [14], it calculates the bitrate of a given video stream. Given the three parameters of a video, the function R can calculate the bitrate of the video stream. ( ) a ( ) b ( ) c q t s R(s, q, r) = R m (5) q min Here R m is the max bitrate of the video; the max bitrate is the video bitrate that encoded with max resolution, max frame rate and minimum QP. q min is the minimum QP value of the video used in this experiment, in this paper the minimum QP is 28, t max is the max frame rate, in this paper the max frame rate is 30, s max is the max resolution, here we use 4CIF. The parameter a, b, and c are obtained by minimizing the residual mean square error (RMSE) between measured and predicted rates corresponding to all scalability parameters. For the calculation of parameter a, we first calculates the Normalized Rate versus Spatial resolution (NRS) [14] by measure the real video bit rate: R(q, s, t) R s (s) = R s (s; q, t) = (6) R(q, s max, t) Because ( s R s (s) = s max t max s max ) c (7)

10752 J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 we then use RMSE to obtain the parameter c. The parameter b and c is obtain the similar way [14]. When the constrained condition is bandwidth B max, the constrained function can be described below: ( ) a ( ) b ( ) c q t s R m B max (8) q min t max Because the QoE of video stream with large bitrate is usually greater than that of a small bit rate video steam, we can consider the max case, i.e., the bit rate is equal to the max bandwidth. To get the optimal parameter, we first reduce dimensions: ( B max s = s max ( R m s max ) a q q min ) b t t max Then we substitute the s to the ANN assessment model; we can convert the problem into a condition-free maximization problem, i.e., ( ) a 1/c q (t, q) = argmax (t,q) U s B max q min max (, t, q (10) t R m 1/c t max ) b The function U is a non-convex function that might has multi max points. Therefore, we use a multiple starting point pattern search algorithm to search the extreme value. Starting point selection is important for the pattern search algorithm. Good starting point can help the search to reach the maximum point fast. In our method, we first build a grid to cover the valid value of the scalability parameters. Then the intersections of the grid are chosen as the starting points. the grids scale can be coarse, each parameters value range can be divided to 3 or 4 parts. From experiment in section II, we can see that the video with large bit rate is usually has better video quality, so we choice the intersection point that the code rate is close to the available bandwidth, in our experiment, the range between [0.9 B max, 1.1 B max ] are good for the search. After search, we chose the scalability parameter combination that leads to max MOS score as the search result. (9) 4 Experiment In the experiment, we simulate that the video stream is transferred under different available bandwidth. We assume that the rate optimization algorithm can get the available bandwidth correctly, this can be done through the network measurement method, like the rate calculation given by RFC3448 [15]. The video sequence we select is crew provide by HHI [16]. In the experiment, first we use the ANN assessment model developed in section II as the video quality assessment method. Then calculate the rate estimate parameter a, b, c. The parameter result is show in Table 1. To test the performance the algorithm, a set of max bandwidth is set up as the target bit rate. The max bandwidth is given from 100kbps to 4.5Mbps. When the bandwidth is below 100kbps, the video quality is hard to distinguish by human with general resolution. When the bandwidth is great than 4.5Mbps, the video quality is almost equal to the

J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 10753 original video. For a given bandwidth limit, we use the pattern search method to search for the parameter combination that gives the max predicted MOS value. We mark the corresponding parameter t and q, with t and q, use formula 9 can calculate the s, then we get the final optimal result. Table 1: The rate estimate parameters Sequence a b c B max crew 1.049 0.644 0.8608 4.905 We compare our algorithm with the algorithm described in [11], which serves as the baseline in this comparison. the baseline algorithm summarize a set of adaptation guideline from the subjective test, the guideline instruct the adaptation Control Unit to change the QP and frame rate when the available bandwidth is increase or decrease. Fig. 2 shows the result of the two algorithms. From the figure we can see our proposed algorithm outperforms the the baseline algorithm. i.e., across all the range of the video bit rate limit, the MOS scores of the adapted videos produced by our algorithm are consistently higher than that of the baseline. More interestingly, the advantage of our algorithm become much larger when the video rate constrain is larger. This is because that our algorithm can adapt three dimensions scalability of the video according to QoE, and when more bandwidth is available, our method has a bigger room to choose the combination of the three scalability parameters so as to achieve a higher QoE, while the baseline are not able to fully utilize the bandwidth, and therefore the subjective quality of the adapted video saturates even more bandwidth become available. This clearly demonstrates the effectiveness of the proposed QoE based video adaption algorithm. Fig. 2: QoE optimize result of adaptation algorithm

10754 J. Deng et al. /Journal of Computational Information Systems 10: 24 (2014) 10747 10754 5 Conclusion In This paper we propose a QoE based video adaptation algorithm. We use the ANN based assessment model as the video quality evaluation method for video adaptation, which reflects the users experience of video and can perform the algorithm in real-time. Experimental results show that our algorithm is superior to the conventional video adaptation algorithm in terms of QoE. References [1] Chang, S.F., and Vetro, A, Video adaptation: concepts, technologies, and open issues, in: Proceedings of the IEEE, 2005, 93, (1), pp. 148-158. [2] Zheng, P., Fan, Y., Song, H., Jia, S., A network-aware QoS framework for video transmission over hybrid wired/wireless networks, Journal of Computational Information Systems, 2008, 4, 2351-2357. [3] Schwarz, H., Marpe, D., and Wiegand, T., Overview of the scalable video coding extension of the H. 264/AVC standard, Circuits and Systems for Video Technology, IEEE Transactions on, 2007, 17, (9), pp. 1103-1120. [4] Stankiewicz, R., Cholda, P., and Jajszczyk, A., QoX: what is it really?, Communications Magazine, IEEE, 2011, 49, (4), pp. 148-158. [5] Venkataraman, M., Chatterjee, M., Inferring video QoE in real time, Network, IEEE, 2011, 25, (1), pp. 4-13. [6] Jain, R., Quality of experience, IEEE MultiMedia, 2004, 11, (1), pp. 96-95. [7] Mohamed, S., and Rubino, G., A study of real-time packet video quality using random neural networks, Circuits and Systems for Video Technology, IEEE Transactions on, 2002, 12, (12), pp. 1071-1083. [8] Zhai, G., Cai, J., Lin, W., Yang, X., and Zhang, W., Three dimensional scalable video adaptation via user-end perceptual quality assessment, Broadcasting, IEEE Transactions on, 2008, 54, (3), pp. 719-727. [9] Khan, A., Sun, L., and Ifeachor, E., QoE prediction model and its application in video quality adaptation over UMTS networks, Multimedia, IEEE Transactions on, 2012, 14, (2), pp. 431-442. [10] Vakili, A., and Grgoire, J.C., QoE management for video conferencing applications, Computer Networks, 2013. [11] Li, M., Chen, Z., and Tan, Y.-P., On quality of experience of scalable video adaptation, Journal of Visual Communication and Image Representation, 2013, 24, (5), pp. 509-521. [12] Aguiar, E., Riker, A., Abelm, A., Cerqueira, E., and Mu, M., Video quality estimator for wireless mesh networks, in: Quality of Service (IWQoS), 2012 IEEE 20th International Workshop on, 2012, pp. 1-9. [13] JSVM H.264/SVC Software, CVS Server, Available FTP: garcon.ient.rwth-aachen.de/cvs/jv. [14] Ma Z, Xu M, Wang Y. Modeling video rate as a function of frame size, frame rate and quantization stepsize [J]. city, 2011, 100: 1.09. [15] Handley, M., Floyd, S., Padhye, J., and Widmer, J., RFC 3448, TCP friendly rate control (TFRC): protocol specification, 2003. [16] Video Clips, Available: ftp.tnt.uni-hannover.de/pub/svc/testsequences/.