Scalable and Coherent Video Resizing with Per-Frame Optimization

Salable and Coheren Video Resizing wih Per-Frame Opimizaion 1 Yu-Shuen Wang1,2 Naional Chiao Tung Universiy 2 Jen-Hung Hsiao2 Olga Sorkine3,4 Tong-Yee Lee2 3 Naional Cheng Kung Universiy New York Universiy 4 ETH Zurih Mammoh HD original video ube deformed video ube original frames deformed frames Figure 1: We inrodue a salable onen-aware video reargeing mehod. Here, we render pairs of original and deformed moion rajeories in red and blue. Making he relaive ransformaion of suh pahlines onsisen ensures emporal oherene of he resized video. Absra 1 The key o high-qualiy video resizing is preserving he shape and moion of visually salien objes while remaining emporallyoheren. These spaial and emporal requiremens are diffiul o reonile, ypially leading exising video reargeing mehods o sarifie one of hem and ausing disorion or waving arifas. Reen work enfores emporal oherene of onen-aware video warping by solving a global opimizaion problem over he enire video ube. This signifianly improves he resuls bu does no sale well wih he resoluion and lengh of he inpu video and quikly beomes inraable. We propose a new mehod ha solves he salabiliy problem wihou ompromising he resizing qualiy. Our mehod faors he problem ino spaial and ime/moion omponens: we firs resize eah frame independenly o preserve he shape of salien regions, and hen we opimize heir moion using a redued model for eah pahline of he opial flow. This faorizaion deomposes he opimizaion of he video ube ino ses of subproblems whose size is proporional o a single frame s resoluion and whih an be solved in parallel. We also show how o inorporae ropping ino our opimizaion, whih is useful for senes wih numerous salien objes where warping alone would degenerae o linear saling. Our resuls mah he qualiy of sae-of-he-ar reargeing mehods while dramaially reduing he ompuaion ime and memory onsumpion, making onen-aware video resizing salable and praial. Conen-aware video reargeing enables o resize videos and hange heir aspe raios while preserving he appearane of visually imporan onen. I has been he opi of aive researh in he reen years due o he proliferaion of video daa presened in various formas on differen devies, from inema and TV sreens o mobile phones. The key o high-qualiy video reargeing is preserving he shape and moion of salien objes while reaining a emporally oheren resul. These spaial and emporal requiremens are diffiul o reonile: when he resizing operaion is opimized o preserve he spaial onen of eah video frame independenly, orresponding objes in differen frames ineviably undergo differen ransformaions, and emporal arifas suh as waving may our. Perfely oheren resizing, suh as homogeneous (linear) saling or ropping, disors all image onen. I is diffiul and someimes impossible o avoid boh spaial and emporal arifas [Wang e al. 2009], and sriking a good balane is a hallenging problem. Keywords: onen-aware video reargeing, salabiliy, emporal oherene Links: DL PDF W EB V IDEO Inroduion I is possible o opimize spaial shape preservaion and emporal oherene ogeher, as shown by Wang e al. [2010]. However, heir mehod formulaes a global opimizaion on he enire video ube, whih does no sale well and beomes inraable as he resoluion or he lengh of he video inrease. Oher exising reargeing mehods usually have o sarifie one of he goals. Conen-aware ropping poenially disards visually imporan objes and inrodues virual amera moion; i is very effiien sine only a limied number of parameers (panning, zoom faor) need o be solved for eah frame. Oher mehods employ loally-varying image deformaion ha adaps o he salieny informaion, and limi he handling of emporal oherene o a small number of frames a a ime [Shamir and Sorkine 2009]. The problem size hen beomes linear in he resoluion of a single frame, making hese mehods salable, bu emporal oherene may suffer subsanially sine obje moions are non-uniformly alered using suh windowing approahes. In his paper, we propose a new onen-aware video reargeing mehod ha is salable wihou ompromising emporal oherene. Our key insigh is ha he problem an be faored ino is spaial and ime/moion omponens, boh of whih an be solved effiienly and salably. Our approah handles spaial and emporal omponens of he problem sequenially. Firs, we independenly

opimize he spaial resizing of eah frame wihou regarding he moion informaion. We hen analyze he resuling moion rajeories in he resized video, i.e., he deformed pahlines of he inpu opial flow. We opimize he pahlines suh ha heir shapes and offses o neighboring pahlines are onsisen wih he inpu video ye also lose o he resul of he firs sage. This may appear as an expensive opimizaion, bu sine we use a redued model for eah pahline, he number of variables is linear in he spaial resoluion of he video. The final sep our algorihm resolves per-frame reargeing using he opimized pahlines as guides, hereby onsolidaing hem ino he final oheren resul. In addiion o warping he video, we also show how o inorporae onen-aware ropping ino he opimizaion proess wihou giving up salabiliy. As observed by Wang e al. [2010], ropping may be neessary in ases where he video is rowded wih muliple prominen objes, or if heir moion rajeories overlap wih he enire bakground. Spaially-varying warping operaions ineviably degenerae o linear saling or ause emporal arifas in suh ases. We use he definiion of emporal persisene of [Wang e al. 2010] o deermine riial regions of all video frames. To leverage ropping, we firs warp video frames o a naural size, where his size may be larger han he arge resoluion. We hen pan he video frames o ensure ha all riial regions fi ino he arge ube and rop off he regions ha fall ouside he ube. We augmen he opimizaion suh ha ropping and warping are ombined ogeher effiienly. Our resuls mah he qualiy of sae-of-he-ar emporallyoheren reargeing mehods while dramaially reduing he ompuaion ime and memory onsumpion. We visually ompare our resuls wih he reen ehniques in he aompanying videos and repor he saisis on ime and spae oss in Se. 5. The salabiliy of our approah makes i praial o rearge videos of high resoluion and lengh. 2 Relaed work Image reargeing. Conen-aware image reargeing forms he basis of our approah, as we use i o opimize he spaial onen appearane of eah frame. The mehods for image reargeing are generally lassified ino disree and oninuous ehniques [Shamir and Sorkine 2009]: disree mehods remove or inser pixels o hange he aspe raio, while oninuous approahes ompue spaially-varying warps wih he desired image dimensions as boundary onsrains. Cropping [Chen e al. 2003; Liu e al. 2003; Suh e al. 2003; Sanella e al. 2006], seam or region arving [Avidan and Shamir 2007; Rubinsein e al. 2008; Prih e al. 2009] and pah-based approahes [Simakov e al. 2008; Cho e al. 2008; Barnes e al. 2009] all disard, dupliae and/or rearrange disree porions of he image o minimize he disorion of salien image pars. They ahieve exellen resuls, espeially when several operaors are ombined [Rubinsein e al. 2009; Dong e al. 2009; Wu e al. 2010], as onfirmed by a reen omprehensive user sudy [Rubinsein e al. 2010]. However, ahieving emporal oherene for disree approahes is hallenging in our onex, sine he forward and bakward mappings beween he original and resized images are no eah oher s inverses. This preludes us from using he disree approahes, as we rely on wo-way orrespondene o be able o opimize he video moion pahlines. We an use any mehod from he oninuous aegory for per-frame resizing, suh as [Gal e al. 2006; Wang e al. 2008; Zhang e al. 2009; Karni e al. 2009] or he per-frame varians of he video resizing mehods [Wolf e al. 2007; Krähenbühl e al. 2009]. These ehniques formulae warp energy funionals ha penalize disorion of salien regions, exessive bending of lines, self-inerseions and more, and ompue he image deformaion ha minimizes he energy. The opimizaion is usually done on a disree mesh overlaid on he image, and full orrespondene beween he inpu and he resized image is reained. Moreover, he advanage of he oninuous energy minimizaion approah is easy usomizaion of he energy erms o he speifi ask a hand. Video reargeing. Video reargeing is more hallenging han image reargeing beause of he addiional emporal oherene requiremen and he need o preserve obje moions. On he oher hand, video offers more play room for ropping, beause objes ropped in one frame migh be visible in he nex. This moivaed Wang e al. [2010] o define emporal persisene, whih we also use in his work: i les ropping o shoren he ime segmen in whih an obje is visible, as long as i is presen in some minimal number of frames. Oher ropping approahes [Liu and Gleiher 2006; Deselaers e al. 2008; Gleiher and Liu 2008] fous on maximizing he amoun of visually salien onen wihin eah ropped frame while opimizing he inrodued virual amera moion. As disussed earlier, emporal oherene makes onen-aware video resizing expensive, sine muliple, if no all frames mus be onsidered simulaneously. Earlier works ried o rearge emporally adjaen regions onsisenly. Wolf e al. [2007] and Krähenbühl e al. [2009] used oninuous warping wih suh onsiseny onsrains, and Rubinsein e al. [2008] ieraively arved he disree video ube using graph-u opimizaion. Suh approahes an be made effiien if only a limied number of previous frames is used o onsrain he onsiseny of he nex frame; sreaming appliaion hen beomes possible [Zhang e al. 2008; Krähenbühl e al. 2009]. However, his leads o emporal arifas sine moion informaion is largely ignored. Inorporaing he opial flow alleviaes hese arifas bu inrodues he salabiliy problem. Wang e al. [2009] deeed amera and obje moions and ensured onsisen resizing of prominen foreground objes. This requires opimizaion on he enire video ube; Wang e al. [2009] implemened a sliding window sreaming approah, bu i does no fully guaranee ha objes reain heir shape hroughou he whole video. To improve oherene, he sreaming ehnique of Krähenbühl e al. [2009] averaged several pas and fuure frames salieny maps. Niu e al. [2010] preserved amera and obje moions while resizing he video frames sequenially. They enouraged onsisen resizing of foregrounds using a moion hisory map and mainained he bakgrounds by onsraining hem w.r.. he previous frame. Their resuls are highly dependen on he firs frame beause of his sequenial proessing. Our approah is relaed o he rop-and-warp ehnique of Wang e al. [2010], whih ombines ropping and warping in one global opimizaion. Like hem, we wish o ompue he opimal rade-off beween spaial and emporal disorion using energy minimizaion. Wang e al. [2010] solve a global opimizaion on he enire video; we also regard he whole video volume for moion preservaion, bu we faor he opimizaion ino smaller problems, allowing our approah o sale o a large number of frames. 3 Moion-preserving salable video warping Pixel moions beween onseuive frames ogeher omprise he moion informaion of he video, whih is exremely apparen o he human eye. Our goal is o preserve he oherene of hese moions in he reargeed resul and avoid emporal arifas suh as waving. A he same ime, we wish o spaially preserve he shape of imporan objes, a goal whih may sand in onfli wih emporal oherene. We srike a balane hrough salable opimizaion of hese wo objeives.

Consider he se of all poins in he firs frame of he video. We an rae he pahlines of hese poins in he opial flow of he video (hey form hree-dimensional rajeories, where ime is he hird axis). When he video is resized, he pahlines deform; inoherene of he deformaion among he pahlines is wha auses emporal arifas. If he video is simply linearly resized, all pahlines undergo he same ransformaion, and offses beween any wo pahlines in eah frame are ransformed by he same saling ransformaion. The video says perfely emporally-oheren, alhough of ourse all depied objes are squeezed or srehed. On he oher hand, if he offses beween wo pahlines are ransformed by a deformaion ha varies (i.e., has non-vanishing derivaive w.r.. ime), his reaes moion arifas and inoherene. Temporally-oheren and onen-preserving video resizing should herefore minimize he emporal derivaive of he pahline offse ransformaion and a he same ime preserve he shapes of salien objes. We an formulae a disree formulaion of he above priniple in he following way: denoe by P he se of pahlines of he opial flow ha we raed in he video; eah P i P is a sequene of pixels P i = {p m i, p m+1 i,..., p n i }, where eah node p i = (x i, yi) is he posiion of he raed pixel a frame. We may seed pahlines in he firs frame of he video (m = 1) and also anywhere in he middle (m > 1); he pahline ends when he raed poin leaves he frame. We plae he seeding nodes on a regular grid and ompue P using he mehod of Werlberger e al. [2009]. Denoe by E he adjaenies of he pahlines, i.e., {i, j} E if p i and p j are neighbors on he seeding grid and a leas one of he pahlines P i, P j sared a frame. We would like he offses p i p j all undergo some saling ransformaion S ij for all (S ij R 2 2 an be a non-uniform saling marix), so he error erm an be wrien as E P = {i,j} E =m n (ˆp i ˆp j) S ij(p i p j) 2, (1) where ˆp i is he loaion of p i in he resized video. Boh he S ij s and ˆp i s are unknowns here. Combining E P and spaial energy erms ha aim o preserve he shape of salien areas in eah frame would resul in a omplee video resizing framework. However, his approah inrodues a salabiliy problem, sine all sampled pixels in all frames are involved in he energy minimizaion and he enire video ube mus be opimized a one. Insead, we faor he problem ino wo separae ones: he spaial reargeing, whih resizes eah frame individually while preserving imporan objes, and he emporal dimension, whih preserves he relaionships beween he moion pahlines. Our proess onsiss of hree sequenial seps: (1) We rearge eah frame separaely; he salien objes are hen preserved, bu he moion pahlines ge disored; (2) We opimize he moion pahlines, balaning beween heir original and deformed shapes (from sep (1)) while sriving o preserve he oheren relaionship beween neighboring pahlines as in Eq. 1; (3) We resize eah video frame again, using he posiions of he pahline nodes from sep (2) as guides. We will see ha his faorizaion allows o keep he number of variables proporional o a single frame s resoluion N, so ha we need o solve O(T ) independen problems of size O(N) (T being he oal number of frames), whih an be done in parallel, as opposed o solving one opimizaion problem wih O(N T ) variables. Sep 1: Per-frame resizing. We an employ any image reargeing mehod o resize he individual frames, as long as per-pixel orrespondenes beween he original and resized images an be obained; any variaional warping mehod, e.g. [Gal e al. 2006; Wang e al. 2008; Krähenbühl e al. 2009; Zhang e al. 2009; Karni e al. 2009] is suiable, while he disree approahes are no, sine x x Figure 2: The original, linearly saled, per-frame resized and he opimal moion pahlines are shown in red, gray, green and blue, respeively, projeed ono he (x, ) plane. Noe ha he horizonal offses beween he pahlines are onsisenly redued in he linearly saled and he opimized rajeories. hey do no allow o easily esablish he p i ˆp i orrespondene. We ombine he gradien magniudes of he pixels olors and opial flow veors, as well as fae deeion, o ompue he salieny maps ha guide he per-frame reargeing operaor. We hose o use he sale-and-sreh mehod of Wang e al. [2008] where salien objes undergo similariy ransformaions. Sep 2: Opimizaion of he moion pahlines. Sep 1 may disor moion informaion sine eah frame is resized independenly and moion is no onsidered. We orre he moion pahlines by opimizing he offse deformaion beween neighboring pahlines, enouraging i owards onsan saling, as in Eq. 1 (see Fig. 2). To redue he number of involved variables, we model he deformaion of eah pahline as ranslaion plus saling along x, y axes: ˆP = S ip i + i, hereby reduing he unknowns ˆp m i,..., ˆp n i o jus a single (non-uniform) saling marix and a ranslaion veor per eah pahline P i. We rewrie Eq. 1 ino Ω P = {i,j} E =m n ((S ip i + i) (S jp j + j)) S ij(p i p j) 2. (2) We balane beween emporal oherene expressed above and spaial shape preservaion ahieved in Sep 1 by onsidering he disane o he pahlines resuling from Sep 1: Ω D = P i n =m (S ip i + i) q 2 i, (3) where Q i = {q m i,..., q n i } is he deformed version of P i afer Sep 1. We minimize Ω P + µω D o solve for S i, S ij, i and obain he opimized pahlines as ˆP i = S ip i + i. (4) The parameer µ balanes he spaial and emporal onsrains. We se µ = 0.5 in our sysem. Sep 3: Moion-guided per-frame resizing. To onsolidae he opimized pahlines ino one oheren video, we repea he onenaware reargeing of eah frame, adding o he warping energy of frame he loaions of he pahline nodes a ime ( p i) as posiional onsrains: Ω H = p i ˆp 2 i, (5) P i where p i are he final node posiions we are opimizing in his sep. x

Blender Foundaion warping o naural widh γ frame panning ropping Figure 3: Our ropping and warping proess. The arge video ube is depied in pink. To redue he widh of a video (lef), our firs sep is o warp he frames o a naural size γ (middle lef) where his size may be larger han he desired widh. To inorporae ropping, we ranslae he video frames o allow all riial regions lie wihin he ube (middle righ) and finally disard he ouer regions (righ). We overlay regular quad grids on eah video frame; denoe heir veries by vj. A ypial quad size is 20 20 pixels. The veries of he firs frame s grid are used o seed he moion pahlines Pi. A pahline may end before he las frame if he moion rajeory goes ouside he frame; as a resul, in some frames here may be grid veries ha have no pahlines in heir surrounding quads. We use suh veries o seed more pahlines o reae a more uniform disribuion of pahline samples. Disreizaion deails. The pahlines are defined a he pixel level ye we use a oarser grid mesh when ompuing he warp. Therefore, we represen he pahline loaion using he quad veries P surrounding i. Namely, we use mean-value oordinaes pi = k V(p ) wk vk o reformulae i Eq. 5 in erms of he unknown deformed grid veries, where V(pi ) are he verex indies of he grid quad ha pi belongs o. We obain he leas-squares posiional onsrains ΩT = 2 X P j V(p ) wj v j p i. i (6) Pi 4 Combining rop wih per-frame reargeing As explained in [Wang e al. 2010], video reargeing mehods ha srive o preserve boh salien spaial onen and emporal oherene neessarily degenerae ino linear saling when he video is densely populaed wih prominen objes or when some foreground objes overlap wih he enire bakground in he ourse of heir moion. To remedy his, Wang e al. [2010] proposed o ombine warp-based resizing wih ropping. They deermine a riial region for eah frame, whih onains aive foreground objes or onen ha is invisible in he following frames. Non-riial regions are allowed o be disarded; he aual amoun of ropping is weaved ino he global opimizaion problem. We would like o employ he same ehnique o improve our reargeing resuls while avoiding global opimizaion over he enire video ube. We mimi he logi of he rop-and-warp ehnique on a per-frame basis. In he following, we desribe he ehnique for widh-reduing resizing; srehing he video an be ahieved equivalenly by reduing is heigh and hen uniformly saling o he desired resoluion. We ompue he riial regions using he mehod of Wang e al. [2010]; he riial regions are onained beween wo verial lines in eah frame. Denoe by W he original widh of he video and Warge he arge widh. To ombine ropping and warping, we will warp eah frame o a widh γ ha is larger han Warge, suh ha he onen ha does no fi ino he arge video ube will be disarded. However, we mus make sure ha all riial regions survive afer reargeing, i.e., heir widhs afer reargeing have o be smaller han γ. Sine he reargeed widhs of he riial regions 0 Warge γ β W Figure 4: W is he inpu video widh and Warge is he arge widh. To deermine he naural widh γ, we warp eah video frame wih sof boundary onsrains, wih an upper bound β on he resuling widh. The warped frames have differen sizes due o he differen salieny maps. We se he naural widh γ as he average of he warped frame widhs. The upper bound β ensures ha he widh of he riial region is smaller han Warge and fis ino he arge video ube. are unknown a priori, we esimae an upper bound β of he desired widh γ, i.e., Warge γ β. We do his by aking he frame wih he wides riial region and esing differen widhs unil he reargeed riial region fis ino Warge. Speifially, we repeaedly redue he frame s widh by 5 pixels using he onen-aware image warping approah. Theoreially, oher frames ould sill have riial regions larger han Warge when reargeed o β, bu we found his heurisi o work well in praie. We ombine ropping ino our sysem by warping eah video frame o a naural widh γ. We hen pan he video frames suh ha all riial regions slide ino he arge video ube, and we rop he video. The seps of his proess are illusraed in Fig. 3 and 4, and deailed below. Sep 1: Naural-widh frame warping. We pre-warp eah frame independenly using a sof onsrain on is widh o deermine he naural video widh γ β. Speifially, we onsrain he x oordinae of he op-lef verex of eah frame o 0 and he boom-righ one (denoed vbr,x ) sofly o Warge by using he energy erm: 2 ΩC = λ v br,x Warge subje o v br,x β. (7) where λ = 0.05 is he weighing faor used in our sysem. This leas-squares erm replaes he original onsrains on he x oordinaes in he warping mehod (as menioned, we employ [Wang e al. 2008]) while he onsrains on y oordinaes of he boundary remain he same. Warping wih suh sof onsrains makes he

ARS Film Produion original linear saling [Wang e al. 2010] our mehod Figure 5: This example shows ha he qualiy of our mehod is ompaible o ha of Wang e al. [2010] alhough he resuls are no exaly he same. The man is preserved beer by [Wang e al. 2010] bu he hild is preserved beer by our mehod. All resuls are emporally oheren, bu he linear saling mehod squeezes everyhing. Please refer o he aompanying video for he fooage. widhs v br,x vary from frame o frame, depending on he onen and saliene of eah frame. Noe ha he upper bound β makes sure ha all riial regions will fi ino he arge ube. We se he naural widh γ as he average of he frame widhs (Fig. 4). We finally warp all frames o widh γ using our new algorihm presened in Se. 3, where he spaial and emporal aspes are boh onsidered. Sine Warge γ, we lasly ranslae he frames suh ha eah riial region fis ino he arge ube. To do his, we dee he frames whose riial regions ended up loses o he lef (righ) boundaries and we ranslae hose frames suh ha hey si exaly a he lef (righ) boundary of he arge ube. We all hese frames keyframes and we smoohly inerpolae heir panning o he res of he video using splines. Sep 2: Frame panning. We disard he video onen ouside he arge ube o omplee he video reargeing proess. Noe ha he ropping does no lead o signifian onen loss, sine he onen of he disarded pars persiss for a while in he arge video in oher frames. Sep3: Cropping. 5 Resuls and disussion We implemened and esed our algorihm on a deskop PC wih Core i5 2.66 GHz CPU and 8 GB of RAM. Eah esed video lip represens a single sene, sine here is no need for oheren resizing aross sene us. We uilize he mehod of Rasheed and Shah [2003] o segmen long inpu videos ino individual senes. We ran our algorihm on a large amoun of videos, inluding fooage wih ompliaed senes and muliple hallenging moions. We found ha our resuls are ompaible o hose presened by Wang e al. [2010], whih is he mos reen sae-of-hear onen-aware video reargeing mehod. Due o he differen sraegies used o preserve emporal oherene, no all resuls are idenial, bu mos of hem are similar. In some ases, our resuls are even beer sine he moion pahline opimizaion is global for he enire video lip, whereas Wang e al. [2010] apply emporal onsrains only loally (o neighboring frames). We also ompare our mehod o [Kra henbu hl e al. 2009], a highly-effiien online Qualiy. quad size (pixels) 20 20 10 10 5 5 3 3 our mem. 22 Mb 100 Mb 432 Mb 1.2 Gb Wang s mem. 175 Mb 688 Mb 3.8 Gb our ime 2.2 se. 10 se. 41 se. 95 se. Wang s ime 24 se. 63 se. 286 se. Table 1: We resize a 688 288 pixel resoluion video wih 224 frames using differen sizes of grid meshes o ompare he oss of our mehod and [Wang e al. 2010]. The oss of opial flow and salieny ompuaion are no inluded sine hey are no our onribuions and are equal for boh approahes. A dash means ha he mehod anno handle erain resoluions. algorihm ha suppors sreaming. Sine his mehod does no onsider moion informaion of opial flows, i may ineviably lead o waving arifas. We show he omparisons in Figures 5, 6 and our aompanying videos. Please noe ha he waving arifas an only be observed in videos. As disussed earlier, alhough all moion pahlines are solved ogeher o reain oherene, he unknowns of eah pahline are only a saling marix and a ranslaion veor, and we need an addiional saling marix per edge beween neighboring pahlines. Hene he number of variables for emporal opimizaion is linear in he video resoluion N, wih a small onsan (here are 2 unknowns for eah saling marix and 2 more for eah ranslaion veor). The subsequen per-frame resizing sep requires solving O(T ) independen opimizaions, eah having O(N ) unknowns, where T is he number of frames. Hene, our mehod an run in parallel. By onras, he mehod of Wang e al. [2010] requires solving an opimizaion problem wih O(N T ) unknowns and is no easily parallelizable. We hus ahieve higher performane, and our ehnique sales linearly. This advanage is espeially noable when handling long videos, as an be seen in Fig. 7: he ime spend per frame remains more or less onsan as he video lengh inreases. We show he omparaive iming saisis in Table 1. Performane. We minimize he objeive funionals using a CPU-based onjugae gradien solver. Sine neighboring frames usually have similar deformaions, we onsider he resul of he previous frame as an iniial guess for he nex one, suh ha he opimizaion an on-

Blender Foundaion original [Kra henbu hl e al. 2009] [Wang e al. 2010] our mehod Figure 6: We ompare our mehod wih [Kra henbu hl e al. 2009] and [Wang e al. 2010]. Sine [Kra henbu hl e al. 2009] does no expliily ake moion informaion ino aoun, he resized ree widens when he sene is moving lef. In onras, [Wang e al. 2010] and our mehod do no have his problem. 100 verge in fewer ieraions. We do no apply a dire solver like previous works, sine i anno benefi from a good iniial guess, and marix faorizaion is expensive. In addiion, we do no employ he GPU o speed up he solver due o he overhead of ransferring daa beween he main memory and he graphis memory, whih is problemai in our seing where we solve moderaely-sized bu numerous per-frame opimizaions. Insead, we developed he ode wih OpenMP o benefi from CPU-based parallel proessing. Our algorihm never requires he enire video ube a one and grealy saves memory spae ompared o global ube opimizaion. We show he peak memory usage saisis in Table 1 for differen grid mesh sizes. Peak usage ours during he opimizaion of moion pahlines sine eah offse beween neighboring pahlines is onsidered. Compared o [Wang e al. 2010], our memory onsumpion is signifianly lower, even when using a high-resoluion grid mesh. I is also worh noing ha our memory fooprin size is nearly independen of he video lengh, hanks o he onsan number of unknowns per eah moion pahline. In onras, he memory onsumpion of he mehod in [Wang e al. 2010] is proporional o he video lengh sine all deformed grid veries need o be solved simulaneously. As an be seen, he mehod of Wang e al. [2010] fails for large resoluions (or large number of frames) due o exeeding memory requiremens. Memory onsumpion. Our sysem solves video frames individually o ahieve salabiliy. In order o preserve emporal oherene, however, i has o opimize moion pahlines over he enire lip, as well as ompue riial regions of all frames in advane. This prevens us from realizing a sreaming implemenaion whih is neessary for online reargeing. I would be possible o onsider he moion pahlines in a bounded number of frames. In our experimens, he waving arifas are hardly noieable for window sizes of 100 frames and above. However, when ombining wih ropping, he maximal riial region size may dramaially differ beween differlimiaions. FPS 80 60 40 20 0 200 300 400 500 600 700 800 900 Video lengh (frames) Figure 7: We es he salabiliy of our mehod by ploing he number of proessed frames per seond when reargeing inreasingly long porions of a 900-frame video lip. The dashed line shows he average FPS. The FPS remains more or less onsan (he aual ime somewha depends on he onen of he video). en pars of he video. Wihou he examinaion of all video frames, he ombinaion raios beween ropping and warping would be inonsisen and he resuling disorions would be noieable even for large window sizes. 6 Conlusions We inrodued a onen and moion aware video reargeing sysem whih ahieves salabiliy. Thanks o he faorizaion of he problem ino individual per-frame opimizaion of spaial onen and moion pahlines for emporal oherene, a global opimizaion of enire video ube is no longer neessary, hereby grealy reduing he ompuaional os and memory onsumpion. This is an imporan advanage in view of he inreasing resoluion of videos ommonly available o onsumers, boh professional fooage suh as news or enerainmen programs, and asual self-reorded video. Reargeing a video may require user inpu o speify semanially meaningful or ineresing regions aording o he aris s inenions; auomai salieny measures are sill imperfe. Having an

ineraive algorihm o resize videos is hus imporan suh ha ediing he salieny informaion resuls in immediae feedbak. In fuure work, we would like o exend our mehod and design a sysem apable of sreaming-based online reargeing. Aknowledgemens We hank he anonymous reviewers for heir onsruive ommens. We are also graeful o Annie Yerberg for narraing he aompanying video, o Tino Weinkauf for helping us wih he figures, and o Chrisa C. Y. Chen for her help wih he video maerials and liensing. The usage of he video lips is permied by ARS Film Produion, Blender Foundaion and MAMMOTH HD. This work was suppored in par by he Landmark Program of he NCKU Top Universiy Proje (onra B0008). Referenes AVIDAN, S., AND SHAMIR, A. 2007. Seam arving for onenaware image resizing. ACM Trans. Graph. 26, 3. BARNES, C., SHECHTMAN, E., FINKELSTEIN, A., AND GOLD- MAN, D. B. 2009. PahMah: A randomized orrespondene algorihm for sruural image ediing. ACM Trans. Graph. 28, 3. CHEN, L. Q., XIE, X., FAN, X., MA, W. Y., ZHANG, H. J., AND ZHOU, H. Q. 2003. A visual aenion model for adaping images on small displays. ACM Mulimedia Sysems Journal 9, 4, 353 364. CHO, T. S., BUTMAN, M., AVIDAN, S., AND FREEMAN, W. T. 2008. The pah ransform and is appliaions o image ediing. In Pro. CVPR 08. DESELAERS, T., DREUW, P., AND NEY, H. 2008. Pan, zoom, san ime-oheren, rained auomai video ropping. In CVPR 08. DONG, W., ZHOU, N., PAUL, J.-C., AND ZHANG, X. 2009. Opimized image resizing using seam arving and saling. ACM Trans. Graph. 28, 5. GAL, R., SORKINE, O., AND COHEN-OR, D. 2006. Feaureaware exuring. In Pro. EGSR 06, 297 303. GLEICHER, M. L., AND LIU, F. 2008. Re-inemaography: Improving he amerawork of asual video. ACM Trans. Mulimedia Compu. Commun. Appl. 5, 1, 1 28. KARNI, Z., FREEDMAN, D., AND GOTSMAN, C. 2009. Energybased image deformaion. Compu. Graph. Forum 28, 5, 1257 1268. KRÄHENBÜHL, P., LANG, M., HORNUNG, A., AND GROSS, M. 2009. A sysem for reargeing of sreaming video. ACM Trans. Graph. 28, 5. LIU, F., AND GLEICHER, M. 2006. Video reargeing: auomaing pan and san. In Pro. Mulimedia 06, 241 250. LIU, H., XIE, X., MA, W.-Y., AND ZHANG, H.-J. 2003. Auomai browsing of large piures on mobile devies. In Pro. ACM Inernaional Conferene on Mulimedia, 148 155. NIU, Y., LIU, F., LI, X., AND GLEICHER, M. 2010. Warp propagaion for video resizing. In Pro. CVPR, 537 544. PRITCH, Y., KAV-VENAKI, E., AND PELEG, S. 2009. Shif-map image ediing. In Pro. ICCV 09. RASHEED, Z., AND SHAH, M. 2003. Sene deeion in Hollywood movies and TV shows. In Pro. CVPR, II 343 8. RUBINSTEIN, M., SHAMIR, A., AND AVIDAN, S. 2008. Improved seam arving for video reargeing. ACM Trans. Graph. 27, 3. RUBINSTEIN, M., SHAMIR, A., AND AVIDAN, S. 2009. Mulioperaor media reargeing. ACM Trans. Graph. 28, 3, 23. RUBINSTEIN, M., GUTIERREZ, D., SORKINE, O., AND SHAMIR, A. 2010. A omparaive sudy of image reargeing. ACM Trans. Graph. 29, 5. SANTELLA, A., AGRAWALA, M., DECARLO, D., SALESIN, D., AND COHEN, M. 2006. Gaze-based ineraion for semiauomai phoo ropping. In Pro. CHI, 771 780. SHAMIR, A., AND SORKINE, O. 2009. Visual media reargeing. In ACM SIGGRAPH Asia Courses. SIMAKOV, D., CASPI, Y., SHECHTMAN, E., AND IRANI, M. 2008. Summarizing visual daa using bidireional similariy. In Pro. CVPR 08. SUH, B., LING, H., BEDERSON, B. B., AND JACOBS, D. W. 2003. Auomai humbnail ropping and is effeiveness. In Pro. UIST, 95 104. WANG, Y.-S., TAI, C.-L., SORKINE, O., AND LEE, T.-Y. 2008. Opimized sale-and-sreh for image resizing. ACM Trans. Graph. 27, 5, 118. WANG, Y.-S., FU, H., SORKINE, O., LEE, T.-Y., AND SEIDEL, H.-P. 2009. Moion-aware emporal oherene for video resizing. ACM Trans. Graph. 28, 5. WANG, Y.-S., LIN, H.-C., SORKINE, O., AND LEE, T.-Y. 2010. Moion-based video reargeing wih opimized rop-and-warp. ACM Trans. Graph. 29, 4, arile no. 90. WERLBERGER, M., TROBIN, W., POCK, T., WEDEL, A., CRE- MERS, D., AND BISCHOF, H. 2009. Anisoropi Huber-L1 opial flow. In Pro. Briish Mahine Vision Conferene (BMVC). WOLF, L., GUTTMANN, M., AND COHEN-OR, D. 2007. Nonhomogeneous onen-driven video-reargeing. In ICCV 07. WU, H., WANG, Y.-S., FENG, K.-C., WONG, T.-T., LEE, T.-Y., AND HENG, P.-A. 2010. Resizing by symmery-summarizaion. ACM Trans. Graph. 29, 6, 159:1 159:9. ZHANG, Y.-F., HU, S.-M., AND MARTIN, R. R. 2008. Shrinkabiliy maps for onen-aware video resizing. In Pro. PG 08. ZHANG, G.-X., CHENG, M.-M., HU, S.-M., AND MARTIN, R. R. 2009. A shape-preserving approah o image resizing. Compu. Graph. Forum 28, 7, 1897 1906.