1 Video is a Cube: Mulidiensional Analysis and Video Qualiy Merics Chrisian Keiel, Marin Rohbucher, Hao Shen and Klaus Diepold Insiue for Daa Processing, Technische Universiä München Arcissr. 21, 80333 München Absrac Qualiy of Experience is becoing increasingly iporan in signal processing applicaions. In aking inspiraion fro cheoerics, we provide an inroducion o he design of video qualiy erics by using daa analysis ehods, which are differen fro radiional approaches. These ehods do no necessiae a coplee undersanding of he huan visual syse. We use ulidiensional daa analysis, an exension of well esablished daa analysis echniques, allowing us o exploi higher diensional daa beer. In he case of video qualiy erics, i enables us o exploi he eporal properies of video ore properly, he coplee hree diensional srucure of he video cube is aken ino accoun in erics design. Saring wih he well known principal coponen analysis and an inroducion o he noaion of uli-way arrays, we hen presen heir ulidiensional exensions, delivering beer qualiy predicion resuls. Alhough we focus on video qualiy, he presened design principles can easily be adaped o oher odaliies and o even higher diensional daases as well. QUALITY OF EXPERIENCE (QoE) is a relaively new concep in signal processing ha ais o describe how video, audio and uli-odal siuli are perceived by huan observers. In he field of video qualiy assessen, i is ofen of ineres for researchers how he overall experience is influenced by differen video coding echnologies, ransission errors or general viewing condiions. The focus is no longer on easurable physical quaniies, bu raher on how he siuli are subjecively experienced and wheher hey are perceived o be of accepable qualiy fro a subjecive poin of view. QoE is in conras o he well-esablished Qualiy of Service (QoS). There, we easure he signal fideliy, i.e. how uch a signal is degraded during processing by noise or oher disurbances. This is usually done by coparing he disored wih he original signal, which hen gives us a easure of he signal s qualiy. To undersand he reason why QoS is no sufficien for capuring he subjecive percepion of qualiy, le us ake a quick look a he os popular eric in signal processing o easure he QoS, he ean squared error (MSE). I is known ha he MSE does no correlae very well wih he huan percepion of qualiy, as we jus deerine he difference beween pixel values in boh iages. The exaple in Fig. 1 illusraes his proble. Boh iages on he lef have he sae MSE wih respec o he original iage. Ye, we perceive he upper iage disored by coding arefacs o be of worse visual qualiy, han he lower iage, where we jus changed he conras slighly. Furher discussions of his proble can be found in [1]. I. HOW TO MEASURE QUALITY OF EXPERIENCE How hen can we easure QoE? The os direc way is o conduc ess wih huan observers, who judge he visual qualiy of video aerial and provide hus inforaion abou he subjecively perceived qualiy. However, we face a proble in real-life: hese ess are ie consuing and quie expensive. The reason for his is ha only a liied nuber of subjecs can ake par in a es a he sae ie, bu also because a uliude of differen es cases have o be considered. Apar fro hese ore logisical probles, subjecive ess are usually no suiable if he video qualiy is required o be oniored in real ie. To overcoe his difficuly, video qualiy erics are designed and used. The ai is o approxiae he huan qualiy percepion as good as possible wih objecively easurable properies of he videos. Obviously, here is no single easurable quaniy ha by iself can represen he perceived QoE. Neverheless, we can deerine soe aspecs which are expeced or shown o have a relaion o he percepion of qualiy and use hese o design an appropriae video qualiy eric. II. DESIGN OF VIDEO QUALITY METRICS - THE TRADITIONAL APPROACH In he radiional approach, he qualiy erics ai o ipleen he spaial and eporal characerisics of he huan visual syse (HVS) as well as possible in our eric. Many aspecs of he HVS are no sufficienly undersood and herefore a coprehensive odel of he HVS is hardly possible o be buil. Neverheless, a leas pars of he HVS can be described sufficienly enough in order o uilize hese properies in video qualiy erics. In general, here are wo differen ways o exploi hese properies according o Winkler [2]: eiher a psychophysical or an engineering approach. The psychophysical approach relies priarily on a (parial) odel of he HVS and ries o exploi known psychophysical effecs, e.g. asking effecs, adapion and conras sensiiviy. One advanage of his approach is ha we are no liied o a specific coding echnology or applicaion scenario, as we ipleen an arificial observer wih properies of he HVS. In Daly s visual differences predicor [3], for exaple, he adapion of he HVS a differen ligh levels is aken ino accoun, followed by an orienaion dependen conras sensiiviy funcion and finally odels of he HVS s differen deecion echanis are applied. This full-reference predicor
2 y n b sae MSE subjecive esing X y = Xb n sequences n weighs b exrac feaures Fig. 1: Iages wih sae MSE, bu differen visual qualiy (lef) and how a odel is buil wih daa analysis: subjecive esing and feaure exracion for each video sequence (righ) hen delivers a easure for he percepual difference in he sae areas of wo iages. Furher represenaives of his approach are Lubin s visual discriinaion odel [4], he Sarnoff Jus Noiceable Difference (JND) [5] and Winkler s percepual disorion eric [6]. In he engineering approach, he properies of he HVS are no ipleened direcly, bu raher i deerines feaures known o be correlaed o he perceived visual qualiy. These feaures are hen exraced and used in he video qualiy eric. As no in-deph undersanding of he HVS is needed, his ype of eric is coonly used in curren research. In conras o he psychophysical approach, however, we are liied o predefined coding echnologies or applicaion scenarios, as we do no consruc an arificial observer, bu raher derive he feaures fro arifacs inroduced by he processing of he videos. One exaple for such a feaure is blocking as seen in Fig. 1. This feaure is well known, as i is especially noiceable in highly copressed video. I is caused by blockbased ransfors such as he Discree Cosine Transfor (DCT) or ineger ransfor in he curren video encoding sandards MPEG-2 and H.264/AVC. Wih his feaure, we exploi he knowledge, ha huan percepion is sensiive o edges, and assue herefore ha arificial edges inroduced by he encoding resuls in a degraded, perceived qualiy. Usually, ore han one feaure is exraced and hese feaures are hen cobined ino one value, by using assupions abou he HVS. Typical represenaives of his approach are he widely used Srucural Siilariy (SSIM) index by Wang e al. [7], he video qualiy eric by Wolf and Pinson [8] and Wason s digial video qualiy eric [9]. Moreover we can disinguish beween full-reference, reduced-reference and no-reference erics, where we have eiher he undisored original, soe ea inforaion abou he undisored original or only he disored video available, respecively. We refer o [10] for furher inforaion on he HVS, and [11], [12] for an overview of curren video qualiy erics. In general, he exploiaion of ore properies of he HVS or heir corresponding feaures in a eric allows us o odel he percepion of qualiy beer. However, since he HVS is no undersood copleely, and consequenly, no explici odel of he HVS describing all is aspecs is available in he couniy, i is no obvious how he feaures should be cobined. Bu do we really need o know a-priori how o cobine he feaures? III. AN ALTERNATIVE METHOD: DATA ANALYSIS Soeies i is helpful o look a oher disciplines. Video qualiy esiaion is no he only applicaion area in which we wan o quanify soehing ha is no direcly accessible for easureen. Siilar probles ofen occur in cheisry and relaed research areas. In food science, for exaple, researchers face a coparable proble: hey wan o quanify he ase of saples, bu ase is no direcly easurable. The classic exaple is abou he deerinaion of he perfec ixure for ho chocolae ha ases bes. One can easure ilk, sugar or cocoa conen, bu here is no an a-priori physical odel ha allows us o define he resuling ase. To solve his proble, a daa-driven approach is applied, i.e. insead of aking explici assupions of he overall syse and relaionship beween he dependen variable, e.g. ase and he influencing variables e.g. ilk, sugar and cocoa, he inpu and oupu variable are analyzed. In his way we obain odels purely via he analysis of he daa. In cheisry, his is known as cheoerics and has been applied successfully o any probles in his field for he las hree decades. I provides a powerful ool o ackle he analysis and predicion of syses ha are undersood only o a liied degree. So far his ehod is no well known in he conex of video qualiy assessen or even uliedia qualiy assessen in general. A good inroducion ino cheoerics can be found in [13]. By applying his ulivariae daa analysis o video qualiy, we now consider he HVS as a black box and herefore do no assue a coplee undersanding of i. The inpu corresponds o feaures we can easure and he oupu of he box o he perceived visual qualiy obained in subjecive ess. Firsly, we exrac feaures fro an iage or video frae I, resuling in a 1 row vecor x. While his is siilar o
3 he engineering approach described in he previous secion, an iporan difference is ha we do no ake any assupion abou he relaionship beween he feaures heselves, bu also no abou how hey are cobined ino a qualiy value. In general, we should no lii he nuber of seleced feaures unnecessarily. Or o quoe Marens and Marens [13], Beware of wishful hinking! As we do no have a coplee undersanding of he underlying syse, i can be faal if we exclude soe feaures before conducing any analysis, because we consider he o be irrelevan. On he oher hand, daa ha can be objecively exraced, like he feaures in our case, is usually cheap or in any case less expensive o generae han subjecive daa gained in ess. If soe feaures are irrelevan o he qualiy, we will find ou during he analysis. Of course i is only sensible o selec feaures ha have soe verified or a leas soe suspeced relaion o he huan percepion of visual qualiy. For exaple, we could easure he roo eperaure, bu i is highly unlikely ha roo eperaure has any influence in our case. For n differen video sequences, we exrac a corresponding feaure vecor x for each sequence and hus ge an n arix X, where each row describes a differen sequence or saple and each colun describes a differen feaure as shown in Fig. 1. We generae a subjecive qualiy value for each of he n sequences by subjecive esing and ge an n 1 colun vecor y ha will represen our ground ruh. Based on his daase, a odel can be generaed o explain he subjecively perceived qualiy wih objecively easurable feaures. Our ai is now o find an 1 colun vecor b ha relaes he feaures in X o our ground ruh in y or provides he weighs for each feaure o ge he corresponding visual qualiy. This process is called calibraion or raining of he odel, and he used sequences are he raining se. We can use b o also predic he qualiy of new, previously unknown sequences. The benefi of using his approach is ha we are able o cobine oally differen feaures ino one eric wihou knowing heir proporional conribuion o he overall percepion of qualiy beforehand. IV. CLASSIC AND WELL KNOWN: LINEAR REGRESSION One classic approach o esiae he weigh vecor b is via a siple uliple linear regression odel, i.e. y = X b + ɛ, (1) where ɛ is he error er. Wihou loss of generaliy, he daa arix X can be assued o be cenered, naely wih zero eans, and consequenly he video qualiy values y are also cenered. Using a leas squares esiaion, we are given an esiaion of b as b = (X X) + X y, (2) where Z + denoes he More-Penrose pseudo-inverse of arix Z. We use he pseudo-inverse, as we can no assue ha coluns of X represening he differen feaures are linearly independen and herefore X X can be rank deficien. For an unknown video sequence V U and he corresponding feaure vecor x u, we are hen able o predic is visual qualiy ŷ u wih ŷ u = x u b. (3) Ye, his siple approach has a drawback: we assue iplicily in he esiaion process of he weighs ha all feaures are equally iporan. Clearly, his will no always be he case, as soe feaures ay have a larger variance han ohers. V. AN IMPROVEMENT: PRINCIPAL COMPONENT REGRESSION We can address he aforeenioned issue by selecing he weighs in he odel, so ha hey ake ino accoun he influence of he individual feaures on he variance in he feaure arix X. We are herefore looking for so-called laen variables, ha are no direcly represened by he easured feaures heselves, bu raher by a hidden cobinaion of he. In oher words, we ai o reduce he diensionaliy of our original feaure space ino a ore copac represenaion, ore fiing for our laen variables. One well known ehod is he Principal Coponen Analysis (PCA), which exracs he laen variables as he Principal Coponens (PCs). The variance of he PCs is expeced o preserve he variance of he original daa. We hen perfor a regression on soe of hese PCs leading o he principal coponen regression (PCR). As PCA is a well known ehod, we jus briefly recap soe basics. Le X be a (cenered) daa arix, we define r = in {n, } and using a singular value decoposiion (SVD) we ge he following facorizaion: X = UDP, (4) where U is an n r arix wih r orhonoral coluns, P is an r arix wih r orhonoral coluns and D is a r r diagonal arix. P is called he loadings arix and is coluns p 1,..., p r are called loadings. They represen he eigenvecors of X X. Furherore we define he scores Marix T = UD = XP. (5) The basic idea behind PCR is o approxiae X by only using he firs g coluns of T and P, represening he g larges eigenvalues of X X and also he firs g PCs. We hereby assue ha he cobinaion of he larges g eigenvalues describe he variance in our daa arix X sufficienly and ha we can herefore discard he saller eigenvalues. If g is saller han r, he odel can be buil wih a reduced rank. Usually we ai o explain a leas 80-90% of he variance in X. Bu also oher selecion crieria are possible. Our regression odel wih he firs g PCs can hus be wrien as y = T g c, (6) where T g represens a arix wih he firs g coluns of T and c he (unknown) weigh vecor. Once again, we perfor a uliple linear regression. We esiae c wih he leas squares ehod as ĉ = (T g T g ) 1 T g y. (7)
4 fraes of a video feaure vecors x pooling funcion feaure vecor x Fig. 2: Teporal pooling In he end, we are ineresed in he weighs, so ha we can direcly calculae he visual qualiy. Therefore we deerine he esiaed weigh vecor b as b = P g ĉ, (8) wih P g represening he arix wih he firs g coluns of P. We can predic he visual qualiy for an unknown video sequence V U and he corresponding feaure vecor x u wih (3). PCA was firsly used in he design of video qualiy erics by Miyahara in [14]. We refer o [15] for furher inforaion on PCA and PCR. A ore sophisicaed ehod ofen used in cheoerics is he parial leas squares regression (PLSR). This ehod also akes he variance in he subjecive qualiy vecor y ino accoun as well as he variance in he feaure arix X. PLSR has been used in he design of video qualiy erics in e.g. [16]. Furher inforaion on PLSR iself can be found in [13] and [17]. VI. VIDEO IS A CUBE The eporal diension is he ain difference beween sill iages and video. In he previous secion we assued ha we exrac he feaure vecor only fro one iage or one video frae, which is a wo diensional arix. In oher words, video was considered o be jus a siple exension of sill iages. This is no a unique oission only in his aricle so far, bu he eporal diension is quie ofen negleced in any conribuions in he field of video qualiy erics. The addiional diension is usually anaged by eporal pooling. Eiher he feaures heselves are eporally pooled ino one feaure value for he whole video sequence or he eric is applied o each frae of he video separaely and hen he eric s values are pooled eporally over all fraes o gain one value, as illusraed in Fig. 2. Pooling is osly done by averaging, bu also oher siple saisical funcions are eployed such as sandard deviaion, 10/90% perceniles, edian or iniu/axiu. Even if a eric considers no only he curren frae, bu also preceding or succeeding fraes, e.g. wih a 3D filer [18] or spaioeporal ubes [19], he overall pooling is sill done wih one of he above funcions. Bu his arbirary pooling, especially averaging, obscures he influence of eporal disorions on he huan percepion of qualiy, as inrinsic dependencies and srucures in he eporal diension are disregarded. The iporance of video feaures eporal properies in he design of video qualiy erics was recenly shown in [20]. Oiing he eporal pooling sep and inroducing he addiional eporal diension direcly in he design of he video qualiy erics can iprove he predicion perforance. We propose herefore o consider video in is naural hree diensional srucure as a video cube. Exending he daa analysis approach, we add an addiional diension o our daase and hus arrive a ulidiensional daa analysis, an exension of he wo diensional daa analysis. In doing so, we gain a beer undersanding of he video s properies and will hus be able o inerpre he exraced feaures beer. We no longer eploy an a-priori eporal pooling sep, bu use he whole video cube o generae he predicion odel for he visual qualiy and hus consider he eporal diension of video ore appropriaely. VII. TENSOR NOTATION Before oving on o he ulidiensional daa analysis, we shorly inroduce he noaion for handling uli-way arrays or ensors. In general, our video cube can be presened as a hree-way u v array V(:, :, :), where he u and v are he frae size, and is he nuber of fraes. Siilarly, we can exend he wo diensional feaure arix X ino he eporal diension as a n hree-way array or feaure cube. Boh are shown in Fig. 3. In his work, we denoe X(i, j, k) as he (i, j, k)-h enry of X, X(i, j, :) as he vecor wih a fixed pair of (i, j) of X, referred o as ensor fiber, and X(i, :, :) he arix of X wih a fixed index i, referred o as ensor slice. The differen fibers and slices are shown in Fig. 4. For ore inforaion abou ensors and uli-way arrays, see [21] and for uli-way daa analysis refer o [22], [23]. VIII. UNFOLDING TIME The easies way o apply convenional daa analysis ehods for analyzing ensor daa, is o represen ensors as arices. I ransfors he eleens of a ensor or uli-way array ino u v Fig. 3: Video cube and feaure cube n
5 colun (ode-1) fibre X(1,:,2) ube (ode-3) fibre horizonal slice X(1,:,:) X(:,2,:) fronal slice X(:,2,1) row (ode-2) fibre X(3,3,:) laeral slice X(:,:,1) Fig. 4: Tensor noaion: fibre (lef) and slice (righ) enries of a arix. Such a process is known as unfolding, aricizaion, or flaening. In our seing, we are ineresed in he eporal diension and herefore perfor he ode-1 unfolding of our hreeway array X(i, j, k). Thus we obain a new n ( ) arix X unfold, whose coluns are arranged ode-1 fibers of X(i, j, k). For sipliciy, we assue ha he eporal order is ainained in X unfold. The srucure of his new arix is shown in Fig. 5. We hen perfor a PCR on his arix as described previously and obain a odel of he visual qualiy. Finally, we can predic he visual qualiy of an unknown video 1 2 X(:,:,1) X(:,:,2) Fig. 5: Unfolding of he feaure cube X(:,:,) sequence V U wih is feaure vecor x u by using (3). Noe, ha x u is now of he diension 1 ( ). One disadvanage during he odel building sep wih PCR is ha he SVD us be perfored on a raher large arix. Depending on he fraes in he video sequence, he ie needed for odel building can increase by a facor of 10 3 or higher. Bu ore iporanly, we sill lose soe inforaion abou he variance by unfolding and hus desroying he eporal srucure. IX. 2D PRINCIPAL COMPONENT REGRESSION Insead of unfolding we can include he eporal diension direcly in he daa analysis via perforing a ulidiensional daa analysis. We use he wo-diensional exension of he PCA, he (2D-PCA), recenly proposed by Yang e al. [24], in cobinaion wih a leas squares regression as 2D-PCR. For a video sequence wih fraes, we can carve he n feaure cube ino slices, where each slice represens one frae. Wihou loss of generaliy, we can copue he covariance or n scaer arix as X Sc = 1 X(:, :, i) X(:, :, i), (9) i=1 where, by abusing he noaion, X(:, :, i) denoes he cenered daa arix. I describes herefore he average covariance over he eporal diension. Then we perfor he SVD perfored on X Sc o exrac he PCs, siilar o he previously described one diensional PCR in (4). Insead of a scores arix T, we now have a hree-way n g scores array T(:, :, :), wih each slice defined as T(:, :, i) = X(:, :, i)p. (10) Siilar o (7), we hen esiae a g 1 predicion weigh for each slice wih he firs g principal coponens as Ĉ(:, :, i) = ( T g (:, :, i) T g (:, :, i) ) + Tg (:, :, i) y(i), (11) before expressing he weighs in our original feaure space wih a 1 hree-way array B(:, :, i) = P g Ĉ(:, :, i), (12) coparable o (8) for he one diensional PCR. Noe, ha he weighs are now represened by a (roaed) arix. A qualiy predicion for he i-h slice can hen be perfored in he sae anor as in (3), i.e. ŷ u (i) = X u (:, :, i) B(:, :, i), (13) where X u represens a 1 feaure arix for one sequence and ŷ u (i) he 1 prediced qualiy vecor. We can now use his qualiy predicion individually for each slice or generae one qualiy value for he whole video sequence by pooling. 2D-PCR has been used so far for video qualiy erics in [25]. X. CROSS VALIDATION In general daa analysis ehods require a raining phase or a raining se. One iporan aspec is o eploy a separae daa se for he validaion of he designed eric. Using he sae daa se for raining and validaion will usually give us isleading resuls. No surprisingly, our eric perfors excellenly wih is raining daa. For unknown video sequences, on he oher hand, he predicion qualiy could be very bad. Bu
10 0 10 0 0 10 0 10 6 raining se validaion sequences X x feaures odel building validaion prediced qualiy odel 1 odel 2 odel 3 subjecive qualiy no cross validaion cross validaion Fig. 6: Cross validaion: spliing up he daa se in raining and validaion se (lef), building differen odels wih each cobinaion (cener) and qualiy predicion of a eric (righ) as enioned previously, he daa in video qualiy erics is usually expensive o generae as we have o conduc subjecive ess. Hence we can no really afford o use only a sub-se of all available daa for he odel building, as he ore raining daa we have, he beer he predicion abiliies of our eric will be. This proble can be parially avoided by perforing a cross validaion, e.g. leave-one-ou. This allows us o use all available daa for raining, bu also o use he sae daa se for he validaion of he eric. Assuing we have i video sequences wih differen conen, hen we use i 1 video sequences for raining and he lef ou sequence for validaion. All in all, we evenually ge i odels. This is illusraed in Fig. 6. The general odel can hen be obained by differen ehods e.g. averaging he weighs over all odels or selecing he odel wih he bes predicion perforance. For ore inforaion on cross validaion in general we refer o [13] and [26]. XI. USING DATA ANALYSIS: AN EXAMPLE METRIC Bu do ore diensions really help us in designing beer video qualiy erics? In order o copare he approaches o daa analysis we presened in his work, we herefore design as an exaple a siple eric for esiaing he visual qualiy of coded video wih each ehod in his secion. The cheapes objecive daa available for an encoded video can be found direcly in is bisrea. Even ough we do no know a-priori which of he bisrea s properies are ore, and which are less iporan, we can safely assue ha hey are relaed in soe way o he perceived visual qualiy. How hey are relaed, will be deerined by daa analysis. In his exaple, we use videos encoded wih he popular H.264/AVC sandard, currenly used in any applicaions fro high definiion HDTV o inerne based IPTV. For each frae, we exrac 16 differen feaures describing he pariioning ino differen block sizes and ypes, he properies of he oion vecors and lasly he quanizaion, siilar o he eric proposed in [27]. Each frae is hus represened as 1 16 feaure vecor x. Noe, ha no furher preprocessing of he bisrea feaures was done. Alernaively, one can also exrac feaures independen of he used coding echnology, e.g. blocking or blurring, as described in [16]. Cerainly, we also need subjecive qualiy values for hese encoded videos as ground ruh in order o perfor he daa analysis. Differen ehodologies as well as he requireens on he es se-up and equipen for obaining his daa are described in inernaional sandards e.g. ITU-R BT.500 or ITU-T P.910. Anoher possibiliy is o use exising, publicly available daases, conaining boh he encoded videos and he visual qualiy values. One advanage of using such daases is ha differen erics can be copared ore easily. For his exaple, we will use a daase provided by IT- IST [28]. I consiss of eleven videos in CIF resoluion (352 288) and a frae rae of 30 fraes per second as shown in Fig. 7. They cover a wide range of differen conen ypes, a biraes fro 64 kbi/s o 2.000 kbi/s, providing a wide visual qualiy range wih in oal n = 52 daa poins, leading o a 52 1 qualiy vecor y. According o [28], he es was conduced using o he DCR double siulus ehod described in ITU-T P.910. For each daa poin, he es subjecs were shown he undisored original video, followed by he disored encoded video, and hen asked o assess he ipairen of he coded video wih respec o he original on a discree five poin ean opinion score (MOS) scale fro 1, very annoying, o 5, ipercepible. For ore inforaion on H.264/AVC in general we refer o [29], and for he H.264/AVC feaure exracion o [27]. A coprehensive lis of publicly available daases is provided a [30]. XII. MORE DIMENSIONS ARE REALLY BETTER Finally, we copare he four video qualiy erics, each designed wih one of he presened ehods. By using a cross validaion approach, we design eleven differen odels for each ehod. Each odel is rained using en video sequences and he lef ou sequence is hen used for validaion of he odel buil wih he raining se. Hence, we can easure he predicion perforance of he odels for unknown video sequences. The perforance of he differen odels is copared by calculaing he Pearson correlaion and he Spearan rank order correlaion beween he subjecive visual qualiy and he
7 Fig. 7: Tes videos fro op o boo, lef o righ: Ciy, Coasguard, Conainer, Crew, Fooball, Forean, Mobile, Silen, Sephan, Table Tennis and Tepee. qualiy predicions. The Pearson correlaion gives an indicaion abou he predicion accuracy of he odel and he Spearan rank order correlaion gives an indicaion how uch he ranking beween he sequences changes beween he prediced and subjecive qualiy. Addiionally, we deerine he roo ean squared error (RMSE) beween predicion and ground ruh, bu also he percenage of predicions ha fall ouside he used qualiy scale fro 1 o 5. By coparing he resuls in Fig. 8 and Table I, we can see ha a beer inclusion of he eporal diension in he odel building helps o iprove he predicion qualiy. Noe, ha his iproveen was achieved very easily, as we did nohing else, bu jus changing he daa analysis ehod. In each sep we exploi he variaion in our daa beer. Firsly jus wihin our eporally pooled feaures wih he sep fro uliple linear regression o PCR, hen by he sep in he hird diension wih unfolding and 2D-PCR. XIII. SUMMARY In his work, we provide an inroducion ino he world of daa analysis and especially he benefis of ulidiensional daa analysis in he design of video qualiy erics. We have seen in our exaple, ha even wih a very basic eric, by using ulidiensional daa analysis, we can increase he perforance of predicing he Qualiy of Experience significanly. Alhough he scope of his inroducion covered only he qualiy of video, he proposed ehods can obviously be exended o ore diensions and/or oher areas of applicaion. I is ineresing o noe, ha he diensions need no be necessarily spaial or eporal, bu also ay represen differen odaliies or perhaps even a furher segenaion of he exising feaure spaces. REFERENCES [1] Z. Wang and A. Bovik, Mean squared error: Love i or leave i? a new look a signal fideliy easures, IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98 117, Jan. 2009. [2] S. Winkler, Digial Video Iage Qualiy and Percepual Coding. CRC Press, 2006, ch. Percepual Video Qualiy Merics - A Review, pp. 155 179. [3] S. J. Daly, Digial Iages and Huan Vision. MIT Press, 1993, ch. The visible differences predicor: An algorih for he assessen of iage fideliy, pp. 179 206. [4] J. Lubin, Vision Models for Targe Deecion and Recogniion. World Scienific Publishing, 1995, ch. A visual discriinaion odel for iaging syse design and evaluaion, pp. 245 283. [5] J. Lubin and D. Fibush, Sarnoff JND vision odel, T1A1.5 Working Group, ANSI T1 Sandards Coiee Sd., 1997. [6] S. Winkler, Digial Video Qualiy - Vision Models and Merics. Wiley & Sons, 2005. [7] Z. Wang, A. Bovik, H. Sheikh, and E. Sioncelli, Iage qualiy assessen: Fro error visibiliy o srucural siilariy, IEEE Transacions on Iage Processing, vol. 13, no. 4, pp. 600 612, Apr. 2004. [8] S. Wolf and M. H. Pinson, Spaial-eporal disorion eric for inservice qualiy onioring of any digial video syse, in Sociey of Phoo-Opical Insruenaion Engineers (SPIE) Conference Muliedia Syses and Applicaions II, vol. 3845, Nov. 1999, pp. 266 277. [9] A. B. Wason, Toward a percepual video-qualiy eric, in Sociey of Phoo-Opical Insruenaion Engineers (SPIE) Conference Huan Vision and Elecronic Iaging III, vol. 3299, Jan. 1998, pp. 139 147. [10] B. A. Wandell, Foundaions of Vision. Sinauer Associaes, 1996. [11] H. R. Wu and K. R. Rao, Eds., Digial Video Iage Qualiy and Percepual Coding. CRC Press, 2006. [12] Z. Wang and A. C. Bovik., Modern Iage Qualiy Assessen, ser. Synhesis Lecures on Iage, Video, and Muliedia Processing. Morgan & Claypool Publishers, 2006. [13] H. Marens and M. Marens, Mulivariae Analysis of Qualiy. Wiley & Sons, 2001. [14] M. Miyahara, Qualiy assessens for visual service, IEEE Counicaions Magazine, vol. 26, no. 10, pp. 51 60, 81, Oc. 1988. [15] I. Jolliffe, Principal Coponen Analysis. Springer, 2002. [16] T. Oelbau, C. Keiel, and K. Diepold, Rule-based no-reference video qualiy evaluaion using addiionally coded videos, IEEE Journal of Seleced Topics in Signal Processing, vol. 3, no. 2, pp. 294 303, April 2009. [17] F. Wesad, K. Diepold, and H. Marens, QR-PLSR: Reduced-rank regression for high-speed hardware ipleenaion, Journal of Cheoerics, vol. 10, pp. 439 451, 1996. [18] K. Seshadrinahan and A. Bovik, Moion uned spaio-eporal qualiy assessen of naural videos, IEEE Transacions on Iage Processing, vol. 19, no. 2, pp. 335 350, Feb. 2010. [19] A. Ninassi, O. Le Meur, P. Le Calle, and D. Barba, Considering eporal variaions of spaial visual disorions in video qualiy assessen, IEEE Journal of Seleced Topics in Signal Processing, vol. 3, no. 2, pp. 253 265, Apr. 2009. [20] C. Keiel, T. Oelbau, and K. Diepold, Iproving he predicion accuracy of video qualiy erics. IEEE Inernaional Conference on Acousics, Speech and Signal Processing, 2010. ICASSP 2010., pp. 2442 2445, Mar. 2010. [21] A. Cichocki, R. Zdunek, A. H. Phan, and S.-I. Aari, Nonnegaive Marix and Tensor Facorizaions: Applicaions o Exploraory Muliway Daa Analysis and Blind Source Separaion. Wiley & Sons, 2009. [22] A. Silde, R. Bro, and P. Geladi, Muli-way Analysis: Applicaions in he Cheical Sciences. Wiley & Sons, 2004. [23] P. M. Kroonenberg, Applied Muliway Daa Analysis. Wiley & Sons, 2008. [24] J. Yang, D. Zhang, A. Frangi, and J. Yang, Two-diensional pca: a new approach o appearance-based face represenaion and recogniion, IEEE Transacions on Paern Analysis and Machine Inelligence, vol. 26, no. 1, pp. 131 137, Jan. 2004. [25] C. Keiel, M. Rohbucher, and K. Diepold, Exending video qualiy erics o he eporal diension wih 2D-PCR, in Iage Qualiy and Syse Perforance VIII, S. P. Farnand and F. Gaykea, Eds., vol. 7867. SPIE, Jan. 2011. [26] E. Anderssen, K. Dyrsad, F. Wesad, and H. Marens, Reducing overopiis in variable selecion by cross-odel validaion, Cheoerics and Inelligen Laboraory Syses, vol. 84, no. 1-2, pp. 69 74, 2006. [27] C. Keiel, J. Habig, M. Klipke, and K. Diepold, Design of noreference video qualiy erics wih uliway parial leas squares regression, in IEEE Inernaional Workshop on Qualiy of Muliedia Experience, 2011. QoMEX 2011., Sep., 2011, pp. 49 54. [28] T. Brandão and M. P. Queluz, No-reference qualiy assessen of H.264/AVC encoded video, IEEE Transacions on Circuis and Syses for Video Technology, vol. 20, no. 11, pp. 1437 1447, Nov. 2010. [29] T. Wiegand, G. J. Sullivan, G. Bjønegaard, and A. Luhra, Overview of he H.264/AVC Video Coding Sandard, IEEE Transacions on Circuis and Syses for Video Technology, vol. 13, no. 7, pp. 560 576, Jul. 2003.
8 prediceed visual qualiy [MOS] prediceed visual qualiy [MOS] 5 4 3 2 1 5 4 3 2 1 linear regression 1 2 3 4 5 visual qualiy [MOS] unfolding + PCR 1 2 3 4 5 visual qualiy [MOS] prediceed visual qualiy [MOS] prediceed visual qualiy [MOS] 5 4 3 2 1 5 4 3 2 1 PCR 1 2 3 4 5 visual qualiy [MOS] 2D-PCR 1 2 3 4 5 visual qualiy [MOS] Fig. 8: Coparison of he presened ehods: subjecive qualiy vs. predicaed qualiy on a ean opinion score (MOS) scale fro 1 o 5, wors o bes qualiy. Pearson correlaion Spearan correlaion RMSE linear regression 0.72 0.72 1.04 15% PCR 0.80 0.82 0.81 13% unfolding + PCR 0.75 0.83 0.82 10% 2D-PCR 0.89 0.94 0.59 6% Ouside scale TABLE I: Perforance easureens: Pearson correlaion, Spearan rank order correlaion and RMSE. Addiionally, he raio of how any qualiy predicions are ouside of he given scale. [30] S. Winkler. (2011, Jul.) Iage and video qualiy resources. [Online]. Available: hp://sefan.winkler.ne/resources.hl