MaxMargin Early Event Detectors


 Colleen Bates
 1 years ago
 Views:
Transcription
1 MaxMargn Early Event Detectors Mnh Hoa Fernando De la Torre Robotcs Insttute, Carnege Mellon Unversty Abstract The need for early detecton of temporal events from sequental data arses n a wde spectrum of applcatons rangng from humanrobot nteracton to vdeo securty. Whle temporal event detecton has been extensvely studed, early detecton s a relatvely unexplored problem. Ths paper proposes a maxmummargn framework for tranng temporal event detectors to recognze partal events, enablng early detecton. Our method s based on Structured Output SVM, but extends t to accommodate sequental data. Experments on datasets of varyng complexty, for detectng facal expressons, hand gestures, and human actvtes, demonstrate the benefts of our approach. To the best of our knowledge, ths s the frst paper n the lterature of computer vson that proposes a learnng formulaton for early event detecton.. Introducton The ablty to make relable early detecton of temporal events has many potental applcatons n a wde range of felds, rangng from securty (e.g., pandemc attack detecton), envronmental scence (e.g., tsunam warnng) to healthcare (e.g., rskoffallng detecton) and robotcs (e.g., affectve computng). A temporal event has a duraton, and by early detecton, we mean to detect the event as soon as possble, after t starts but before t ends, as llustrated n Fg.. To see why t s mportant to detect events before they fnsh, consder a concrete example of buldng a robot that can affectvely nteract wth humans. Arguably, a key requrement for such a robot s ts ablty to accurately and rapdly detect the human emotonal states from facal expresson so that approprate responses can be made n a tmely manner. More often than not, a socally acceptable response s to mtate the current human behavor. Ths requres facal events such as smlng or frownng to be detected even before they are complete; otherwse, the mtaton response would be out of synchronzaton. Despte the mportance of early detecton, few machne learnng formulatons have been explctly developed for early detecton. Most exstng methods (e.g., [5, 3, 6,,, 9]) for event detecton are desgned for offlne process #"$%&'(" #/"&/.%)'(*("$%&'(".,(/*" )#$*" +,*,(" Fgure. How many frames do we need to detect a smle relably? Can we even detect a smle before t fnshes? Exstng event detectors are traned to recognze complete events only; they requre seeng the entre event for a relable decson, preventng early detecton. We propose a learnng formulaton to recognze partal events, enablng early detecton. ng. They have a lmtaton for processng sequental data as they are only traned to detect complete events. But for early detecton, t s necessary to recognze partal events, whch are gnored n the tranng process of exstng event detectors. Ths paper proposes MaxMargn Early Event Detectors (MMED), a novel formulaton for tranng event detectors that recognze partal events, enablng early detecton. MMED s based on Structured Output SVM (SOSVM) [7], but extends t to accommodate the nature of sequental data. In partcular, we smulate the sequental framebyframe data arrval for tranng tme seres and learn an event detector that correctly classfes partally observed sequences. Fg. llustrates the key dea behnd MMED: partal events are smulated and used as postve tranng examples. It s mportant to emphasze that we tran a sngle event detector to recognze all partal events. But MMED does more than augmentng the set of tranng examples; t trans a detector to localze the temporal extent of a target event, even when the target event has yet fnshed. Ths requres monotoncty of the detecton functon wth respect to the ncluson relatonshp between partal events the detecton score (confdence) of a partal event cannot exceed the score of an encompassng partal event. MMED provdes a prncpled mechansm to acheve ths monotoncty, whch cannot be assured by a nave soluton that smply augments the set of tranng examples. The learnng formulaton of MMED s a constraned quadratc optmzaton problem. Ths formulaton s the
2 &%,'#*(" )#3#'"$%&'($" #".%)'(*("$%&'(" Fgure. Gven a tranng tme seres that contans a complete event, we smulate the sequental arrval of tranng data and use partal events as postve tranng examples. The red segments ndcate the temporal extents of the partal events. We tran a sngle event detector to recognze all partal events, but our method does more than augmentng the set of tranng examples. oretcally justfed. In Sec. 3., we dscuss two ways for quantfyng the loss for contnuous detecton on sequental data. We prove that, n both cases, the objectve of the learnng formulaton s to mnmze an upper bound of the true loss on the tranng data. MMED has numerous benefts. Frst, MMED nherts the advantages of SOSVM, ncludng ts convex learnng formulaton and ts ablty for accurate localzaton of event boundares. Second, MMED, specfcally desgned for early detecton, s superor to SOSVM and other competng methods regardng the tmelness of the detecton. Experments on datasets of varyng complexty, rangng from sgn language to facal expresson and human actons, showed that our method often made faster detectons whle mantanng comparable or even better accuracy.. Prevous work Ths secton dscusses prevous work on early detecton and event detecton... Early detecton Whle event detecton has been studed extensvely n the lterature of computer vson, lttle attenton has been pad to early detecton. Davs and Tyag [] addressed rapd recognton of human actons usng the probablty rato test. Ths s a passve method for early detecton; t assumes that a generatve HMM for an event class, traned n a standard way, can also generate partal events. Smlarly, Ryoo [5] took a passve approach for early recognton of human actvtes; he developed two varants of the bagofwords representaton to manly address the computatonal ssues, not tmelness or accuracy, of the detecton process. Prevous work on early detecton exsts n other felds, but ts applcablty n computer vson s unclear. Nell et al. [] studed dsease outbreak detecton. Ther approach, lke onlne changepont detecton [3], s based on detectng the locatons where abrupt statstcal changes occur. Ths technque, however, cannot be appled to detect temporal events such as smlng and frownng, whch must and can be detected and recognzed ndependently of the background. Brown et al. [] used the ngram model for predctve typng,.e., predctng the next word from prevous words. However, t s hard to apply ther method to computer vson, whch does not have a welldefned language model yet. Early detecton has also been studed n the context of spam flterng, where mmedate and rreversble decsons must be made whenever an emal arrves. Assumng spam messages were smlar to one another, Hader et al. [6] developed a method for detectng batches of spam messages based on clusterng. But vsual events such as smlng or frownng cannot be detected and recognzed just by observng the smlarty between consttuent frames, because ths characterstc s nether requste nor exclusve to these events. It s mportant to dstngush between forecastng and detecton. Forecastng predcts the future whle detecton nterprets the present. For example, fnancal forecastng (e.g., [8]) predcts the next day s stock ndex based on the current and past observatons. Ths technque cannot be drectly used for early event detecton because t predcts the raw value of the next observaton nstead of recognzng the event class of the current and past observatons. Perhaps, forecastng the future s a good frst step for recognzng the present, but ths twostage approach has a dsadvantage because the former may be harder than the latter. For example, t s probably easer to recognze a partal smle than to predct when t wll end or how t wll progress... Event detecton Ths secton revews SVM, HMM, and SOSVM, whch are among the most popular algorthms for tranng event detectors. None of them are specfcally desgned for early detecton. Let (X,y ),, (X n,y n ) be the set of tranng tme seres and ther assocated ground truth annotatons for the events of nterest. Here we assume each tranng sequence contans at most one event of nterest, as a tranng sequence contanng several events can always be dvded nto smaller subsequences of sngle events. Thus y = [s, e ] conssts of two numbers ndcatng the start and the end of the event n tme seres X. Suppose the length of an event s bounded by l mn and l max and we denote Y(t) be the set of lengthbounded tme ntervals from the st to the t th frame: Y(t) = {y N y [, t], l mn y l max } { }. Here s the length functon. For a tme seres X of length l, Y(l) s the set of all possble locatons of an event; the empty segment, y =, ndcates no event occurrence. For an nterval y = [s, e] Y(l), let X y denote the subsegment of X from frame s to e nclusve. Let g(x) denote the output of the detector, whch s the segment that maxmzes
3 the detecton score: g(x) = argmaxf(x y ; θ). () y Y(l) The output of the detector may be the empty segment, and f t s, we report no detecton. f(x y ; θ) s the detecton score of segment X y, and θ s the parameter of the score functon. Note that the detector searches over temporal scales from l mn to l max. In testng, ths process can be repeated to detect multple target events, f more than one event occur. How s θ learned? Bnary SVM methods learn θ by requrng the score of postve tranng examples to be greater than or equal to,.e., f(x y ; θ), whle constranng the score of negatve tranng examples to be smaller than or equal to. Negatve examples can be selected n many ways; a smple approach s to choose random segments of tranng tme seres that do not overlap wth postve examples. HMM methods defne f(, θ) as the loglkelhood and learn θ that maxmzes the total loglkelhood of postve tranng examples,.e., maxmzng f(x y ; θ). HMM methods gnore negatve tranng examples. SOSVM methods learn θ by requrng the score of a postve tranng example X y to be greater than the score of any other segment from the same tme seres,.e., f(x y ; θ) > f(x y ; θ) y y. SOSVM further requres ths constrant to be well satsfed by a margn: f(x y ; θ) f(x y ; θ) + (y,y) y y, where (y,y) s the loss of the detector for outputtng y when the desred output s y []. Though optmzng dfferent learnng objectves and constrants, all of these aforementoned methods use the same set of postve examples. They are traned to recognze complete events only, nadequately prepared for the task of early detecton. 3. MaxMargn Early Event Detectors As explaned above, exstng methods do not tran detectors to recognze partal events. Consequently, usng these methods for onlne predcton would lead to unrelable decsons as we wll llustrate n the expermental secton. Ths secton derves a learnng formulaton to address ths problem. We use the same notatons as descrbed n Sec Learnng wth smulated sequental data Let ϕ(x y ) be the feature vector for segment X y. We consder a lnear detecton score functon: { w f(x y ; θ) = T ϕ(x y ) + b f y, () otherwse. Here θ = (w, b), w s the weght vector and b s the bas term. From now on, for brevty, we use f(x y ) nstead of f(x y ; θ) to denote the score of segment X y. To support early detecton of events n tme seres data, we propose to use partal events as postve tranng examples (Fg. ). In partcular, we smulate the sequental arrval of tranng data as follows. Suppose the length of X s l. For each tme t =,, l, let y t be the part of event y that has already happened,.e., y t = y [, t], whch s possbly empty. Ideally, we want the output of the detector on tme seres X at tme t to be the partal event,.e., g(x [,t] ) = y t. (3) Note that g(x [,t]) s not the output of the detector runnng on the entre tme seres X. It s the output of the detector on the subsequence of tme seres X from the frst frame to the t th frame only,.e., g(x [,t] ) = argmax f(x y ). () y Y(t) From (3)(), the desred property of the score functon s: f(x y t) f(x y) y Y(t). (5) Ths constrant requres the score of the partal event yt to be hgher than the score of any other tme seres segment y whch has been seen n the past, y [, t]. Ths s llustrated n Fg. 3. Note that the score of the partal event s not requred to be hgher than the score of a future segment. As n the case of SOSVM, the prevous constrant can be requred to be well satsfed by an adaptve margn. Ths margn s (yt,y), the loss of the detector for outputtng y when the desred output s yt (n our case (yt,y) = y t y yt + y ). The desred constrant s: f(x y t) f(x y) + (y t,y) y Y(t). (6) Ths constrant should be enforced for all t =,, l. As n the formulatons of SVM and SOSVM, constrants are allowed to be volated by ntroducng slack varables, and we obtan the followng learnng formulaton: mnmze w,b,ξ w + C n n ξ, (7) = s.t. f(x y f(x t) y) + (yt,y) µ ξ ( y y, t =,, l, y Y(t). (8) ( ) y t Here denotes the length functon, and µ y s a functon of( the proporton ) of the event that has occurred y at tme t. µ s a slack varable rescalng factor and y should correlate wth the mportance of correctly detectng at tme t whether the eventy has happened. µ( ) can be any )
4 X s t e t t t t t #$%&"" %'()'*&" y t #$$3"" ''*&" )#3'&'"" ''*&" +,&,'"" %'()'*&" "*%&$/*&5" > f(x y t) f(x y past ).'%/'."%'"+,**"f( ) Fgure 3. The desred score functon for early event detecton: the complete event must have the hghest detecton score, and the detecton score of a partal event must be hgher than that of any segment that ends before the partal event. To learn ths functon, we explctly consder partal events durng tranng. At tme t, the score of the truncated event (red segment) s requred to be hgher than the score of any segment n the past (e.g., blue segment); however, t s not requred to be hgher than the score of any future segment (e.g., green segment). Ths fgure s best seen n color. arbtrary nonnegatve functon, and n general, t should be a nondecreasng functon n (, ]. In our experments, we found the followng pecewse lnear functon a reasonable choce: µ(x) = for < x α; µ(x) = (x α)/(β α) for α < x β; and µ(x) = for β < x or x =. Here, α and β are tunable parameters. µ() = µ() emphaszes that true rejecton s as mportant as true detecton of the complete event. Ths learnng formulaton s an extenson of SOSVM. From ths formulaton, we obtan SOSVM by not smulatng the sequental arrval of tranng data,.e., to set t = l nstead of t =,, l n Constrant (8). Notably, our method does more than augmentng the set of tranng examples; t enforces the monotoncty of the detector functon, as shown n Fg.. For a better understandng of Constrant (8), let us analyze the constrant wthout the slack varable term and break t nto three cases: ) t < s (event has not started); ) t s, y = (event has started; compare the partal event aganst the detecton threshold); ) t s, y (event has started; compare the partal event aganst any nonempty segment). Recall f(x ) = and y t = for t < s, cases (), (), () lead to Constrants (9), (), (), respectvely: f(x y) y Y(s ) \ { }, (9) f(x y t ) t s, () f(x y t ) f(x y) + (y t,y) t s,y Y(t) \ { }. () Constrant (9) prevents false detecton when the event has $%&'(%$#&)*(%#+,).*#f( ) Fgure. Monotoncty requrement the detecton score of a partal event cannot exceed the score of an encompassng partal event. MMED provdes a prncpled mechansm to acheve ths monotoncty, whch cannot be assured by a nave soluton that smply augments the set of tranng examples. not started. Constrant () requres successful recognton of partal events. Constrant () trans the detector to accurately localze the temporal extent of the partal events. The proposed learnng formulaton Eq. (7) s convex, but t contans a large number of constrants. Followng [7], we propose to use constrant generaton n optmzaton,.e., we mantan a smaller subset of constrants and teratvely update t by addng the most volated ones. Constrant generaton s guaranteed to converge to the global mnmum. In our experments descrbed n Sec., ths usually converges wthn teratons. Each teraton requres mnmzng a convex quadratc objectve. Ths objectve s optmzed usng Cplex n our mplementaton. 3.. Loss functon and emprcal rsk mnmzaton In Sec. 3., we have proposed a formulaton for tranng early event detectors. Ths secton provdes further dscusson on what exactly s beng optmzed. Frst, we brefly revew the loss of SOSVM and ts surrogate emprcal rsk. We then descrbe two general approaches for quantfyng the loss of a detector on sequental data. In both cases, what Eq. (7) mnmzes s an upper bound on the loss. As prevously explaned, (y, ŷ) s the functon that quantfes the loss assocated wth a predcton ŷ, f the true output value s y. Thus, n the settng of offlne detecton, the loss of a detector g( ) on a sequenceevent par (X, y) s quantfed as (y, g(x)). Suppose the sequenceevent pars (X, y) are generated accordng to some dstr P(X,y), the loss of the detector g s R true (g) = buton X Y (y, g(x))dp(x,y). However, P s unknown so the performance of g(.) s descrbed by the emprcal rsk www.bm.com/software/ntegraton/optmzaton/cplexoptmzer/
5 on the tranng data {(X,y )}, assumng they are generated..d accordng to P. The emprcal rsk s R emp(g) = n n = (y, g(x )). It has been shown that SOSVM mnmzes an upper bound on the emprcal rsk R emp [7]. Due to the nature of contnual evaluaton, quantfyng the loss of an onlne detector on streamng data requres aggregatng the losses evaluated throughout the course of the data sequence. Let us consder the loss assocated wth a predcton y = g(x [,t] ) for tme seres X at tme t as (y t,y)µ ( y y ). Here (y t,y) accounts for the dfference ( ) between the output y and true truncated event yt. y µ s the scalng factor; t depends on how much the y temporal event y has happened. Two possble ways for aggregatng these loss quanttes s to use ther maxmum or average. They lead to two dfferent emprcal rsks for a set of tranng tme seres: R,µ max(g) = n R,µ mean (g) = n n = n = { ( y max (yt, g(x t [,t] ))µ y mean t { (y t, g(x [,t] ))µ ( y y )}, )}. In the followng, we state and prove a proposton that establshes that the learnng formulaton gven n Eq. 7 mnmzes an upper bound of the above two emprcal rsks. Proposton: Denote by ξ (g) the optmal soluton of the slack varables n Eq. (7) for a gven detector g, then n n = ξ s an upper bound on the emprcal rsks (g) and R,µ R,µ max mean (g). Proof: Consder Constrant (8) wth y = g(x [,t] ) and together wth the fact that f(x g(x )) f(x we y [,t] ( ) t), have ξ (yt, g(x [,t] ))µ y y t. Thus ξ ( ) max t { (yt, g(x [,t] ))µ y }. Hence n n = ξ y R,µ max(g) R,µ mean(g). Ths completes the proof of the proposton. Ths proposton justfes the objectve of the learnng formulaton.. Experments Ths secton descrbes our experments on several publcly avalable datasets of varyng complexty... Evaluaton crtera Ths secton descrbes several crtera for evaluatng the accuracy and tmelness of detectors. We used the area under the ROC curve for accuracy comparson, Normalzed Tme to Detecton (NTtoD) for benchmarkng the tmelness of detecton, and F score for evaluatng localzaton qualty. Area under the ROC curve: Consder testng a detector on a set of tme seres. The False Postve Rate (FPR) of the detector s defned as the fracton of tme seres that the detector fres before the event of nterest starts. The True Postve Rate (TPR) s defned as the fracton of tme seres that the detector fres durng the event of nterest. A detector typcally has a detecton threshold that can be adjusted to trade off hgh TPR for low FPR and vse versa. By varyng ths detecton threshold, we can generate the ROC curve whch s the functon of TPR aganst FPR. We use the area under the ROC for evaluatng the detector accuracy. AMOC curve: To evaluate the tmelness of detecton we used Normalzed Tme to Detecton (NTtoD) whch s defned as follows. Gven a testng tme seres wth the event of nterest occurs from s to e. Suppose the detector starts to fre at tme t. For a successful detecton, s t e, we defne the NTtoD as the fracton of event that has occurred, t s+.e., e s+. NTtoD s defned as for a false detecton (t < s) and for a false rejecton (t > e). By adjustng the detecton threshold, one can acheve lower NTtoD at the cost of hgher FPR and vce versa. For a complete characterstc pcture, we vared the detecton threshold and plotted the curve of NToD versus FPR. Ths s referred as the Actvty Montorng Operatng Curve (AMOC) []. Fscore curve: The ROC and AMOC curves, however, do not provde a measure for how well the detector can localze the event of nterest. For ths purpose, we propose to use the framebased F scores. Consder runnng a detector on a tmes seres. At tme t the detector output the segment y whle the ground truth (possbly) truncated event s y. The Fscore s defned as the harmonc mean of precson and recall values: F := Precson Recall Precson+Recall, wth Precson := y y y and Recall := y y y. For a new test tme seres, we can smulate the sequental arrval of data and record the F scores as the event of nterest unroll from % to %. We refer to ths as the Fscore curve... Synthetc data We frst valdated the performance of MMED on a synthetcally generated dataset of tme seres. Each tme seres contaned one nstance of the event of nterest, sgnal 5(a)., and several nstances of other events, sgnals 5(a). v. Some examples of these tme seres are shown n Fg. 5(b). We randomly splt the data nto tranng and testng subsets of equal szes. Durng testng we smulated the sequental arrval of data and recorded the moment that MMED started to detect the start of the event of nterest. Wth % precson, MMED detected the event when t had completed 7.5% of the event. For comparson, SOSVM requred observng 77.5% of the event for a postve detecton. Examples of testng tme seres and results are depcted n Fg. 5(b). The events of nterest are drawn n
6 5 5 v 5 (a) Fgure 5. Synthetc data experment. (a): tme seres were created by concatenatng the event of nterest () and several nstances of other events () (v). (b): examples of testng tme seres; the sold vertcal red lnes mark the moments that our method starts to detect the event of nterest whle the dash blue lnes are the results of SOSVM. green and the sold vertcal red lnes mark the moments that our method started to detect these events. The dash vertcal blue lnes are the results of SOSVM. Notably, ths result reveals an nterestng capablty of MMED. For the tme seres n ths experment, the change n sgnal values from 3 to s exclusve to the target events. MMED was traned to recognze partal events, t mplctly dscovered ths unque behavor, and t detected the target events as soon as ths behavor occurred. In ths experment, we represented each tme seres segment by the L normalzed hstogram of sgnal values n the segment (normalzed to have unt norm). We used lnear SVM wth C =, α =, β =..3. Auslan dataset Australan sgn language (b) Ths secton descrbes our experments on a publcly avalable dataset [7] that contans 95 Auslan sgns, each wth 7 examples. The sgns were captured from a natve sgner usng poston trackers and nstrumented gloves; the locaton of two hands, the orentaton of the palms, and the bendng of the fngers were recorded. We consdered detectng the sentence I love you n monologues obtaned by concatenatng multple sgns. In partcular, each monologue contaned an Iloveyou sentence whch was preceded and succeeded by 5 random sgns. The Iloveyou sentence was ordered concatenaton of random samples of three sgns: I, love, and you. We created tranng and testng monologues from dsjont sets of sgn samples; the frst 5 examples of each sgn were used to create tranng monologues whle the last examples were used for testng monologues. The average lengths and standard devatons of the monologues and the Iloveyou sentences were 836 ± 38 and 58 ± 6 respectvely. Prevous work [7] reported hgh recognton performance on ths dataset usng HMMs. Followng ther success, we mplemented a contnuous densty HMM for I loveyou sentences. Our HMM mplementaton conssted of states, each was a mxture of Gaussans. To use the HMM for detecton, we adopted a sldng wndow approach; the wndow sze was fxed to the average length of the Iloveyou sentences. Inspred by the hgh recognton rate of HMM, we constructed the feature representaton for SVMbased detectors (SOSVM and MMED) as follows. We frst traned a Gaussan Mxture Model of Gaussans for the frames extracted from the Iloveyou sentences. Each frame was then assocated wth a loglkelhood vector. We retaned the top three values of ths vector, zerong out the other values, to create a framelevel feature representaton. Ths s often referred to as a soft quantzaton approach. To compute the feature vector for a gven wndow, we dvded the wndow nto two roughly equal halves, the mean feature vector of each half was calculated, and the concatenaton of these mean vectors was used as the feature representaton of the wndow. A nave strategy for early detecton s to use truncated events as postve examples. For comparson, we mplemented Seg[.5,], a bnary SVM that used the frst halves of the Iloveyou sentences n addton to the full sentences as postve tranng examples. Negatve tranng examples were random segments that had no overlappng wth the I loveyou sentences. We repeated our experment tmes and recorded the average performance. Regardng the detecton accuracy, all methods except SVM[.5,] performed smlarly well. The ROC areas for HMM, SVM[.5,], SOSVM, and MMED were.97,.9,.99, and.99, respectvely. However, when comparng the tmelness of detecton, MMED outperformed the others by a large margn. For example, at % false postve rate, our method detected the Iloveyou sentence when t observed the frst 37% of the sentence. At the same false postve rate, the best alternatve method requred seeng 6% of the sentence. The full AMOC curves are depcted n Fg. 6(a). In ths experment, we used lnear SVM wth C =, α =.5, β =... Extended Cohn Kanade dataset expresson The Extended CohnKanade dataset (CK+) [] contans 37 facal mage sequences from 3 subjects performng one of seven dscrete emotons: anger, contempt, dsgust, fear, happness, sadness, and surprse. Each of the sequences contans mages from onset (neutral frame) to peak expresson (last frame). We consdered the task of detectng negatve emotons: anger, dsgust, fear, and sadness. We used the same representaton as [], where each frame s represented by the canoncal normalzed appearance feature, referred as CAPP n []. For comparson purposes, we mplemented two framebased SVMs: Frmpeak was traned on peak frames of the tranng sequences whle Frmall was traned usng all frames between the onset and offset of the facal acton. Framebased SVMs can be used for detecton by classfyng ndvdual frames. In
7 Normalzed Tme to Detect HMM Seg [.5,] SOSVM MMED False Postve Rate (a) Auslan, AMOC Normalzed Tme to Detect Frm peak Frm all SOSVM MMED False Postve Rate (b) CK+, AMOC F score Seg [] Seg [.5,]. SOSVM MMED Fracton of the event seen (c) Wezmann, F curve Fgure 6. Performance curves. (a, b): AMOC curves on Auslan and CK+ datasets; at the same false postve rate, MMED detects the event of nterest sooner than the others. (c): Fscore curves on Wezmann dataset; MMED provdes better localzaton for the event of nterest, especally when the fracton of the event observed s small. Ths fgure s best seen n color. contrast, SOSVM and MMED are segmentbased. Snce a facal expresson s a devaton of the neutral expresson, we represented each segment of an emoton sequence by the dfference between the end frame and the start frame. Even though the start frame was not necessary a neutral face, ths representaton led to good recognton results. We randomly dvded the data nto dsjont tranng and testng subsets. The tranng set contaned sequences wth equal numbers of postve and negatve examples. For relable results, we repeated our experment tmes and recorded the average performance. Regardng the detecton accuracy, segmentbased SVMs outperformed framebased SVMs. The ROC areas (mean and standard devaton) for Frmpeak, Frmall, SOSVM, MMED are.8 ±.,.8 ±.3,.96 ±., and.97 ±., respectvely. Comparng the tmelness of detecton, our method was sgnfcantly better than the others, especally at low false postve rate. For example, at % false postve rate, Frmpeak, Frmall, SOSVM, and MMED can detect the expresson when t completes 7%, 6%, 55%, and 7% respectvely. Fg. 6(b) plots the AMOC curves, and Fg. 7 dsplays some qualtatve results. In ths experment, we used a lnear SVM wth C =, α =, β = Wezmann dataset human acton The Wezmann dataset contans 9 vdeo sequences of 9 people, each performng actons. Each vdeo sequence n ths dataset only conssts of a sngle acton. To measure the accuracy and tmelness of detecton, we performed experments on longer vdeo sequences whch were created by concatenatng exstng sngleacton sequences. Followng [5], we extracted bnary masks and computed Eucldean dstance transform for framelevel features. Framelevel feature vectors were clustered usng kmeans to create a codebook of temporal words. Subsequently, each frame (a) (b) dsgust fear...6. Fgure 7. Dsgust (a) and fear (b) detecton on CK+ dataset. From left to rght: the onset frame, the frame at whch MMED fres, the frame at whch SOSVM fres, and the peak frame. The number n each mage s the correspondng NTtoD. was represented by the ID of the correspondng codebook entry and each segment of a tme seres was represented by the hstogram of temporal words assocated wth frames nsde the segment. We traned a detector for each acton class, but consdered them one by one. We created 9 long vdeo sequences, each composed of vdeos of the same person and had the event of nterest at the end of the sequence. We performed leaveoneout cross valdaton; each cross valdaton fold traned the event detector on 8 sequences and tested t on the leaveout sequence. For the testng sequence, we computed the normalzed tme to detecton at % false postve rate. Ths false postve rate was acheved by rasng the threshold for detecton so that the detector would not fre before the event started. We calculated the medan normalzed tme to detecton across 9 cross valdaton folds and averaged these medan values across acton classes; the resultng values for Seg[], Seg[.5,], SOSVM, MMED are.6,.3,.6, and. respectvely. Here Seg[] was
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationA Study of the Cosine DistanceBased Mean Shift for Telephone Speech Diarization
TASL046013 1 A Study of the Cosne DstanceBased Mean Shft for Telephone Speech Darzaton Mohammed Senoussaou, Patrck Kenny, Themos Stafylaks and Perre Dumouchel Abstract Speaker clusterng s a crucal
More informationPerson Reidentification by Probabilistic Relative Distance Comparison
Person Redentfcaton by Probablstc Relatve Dstance Comparson WeSh Zheng 1,2, Shaogang Gong 2, and Tao Xang 2 1 School of Informaton Scence and Technology, Sun Yatsen Unversty, Chna 2 School of Electronc
More informationWho are you with and Where are you going?
Who are you wth and Where are you gong? Kota Yamaguch Alexander C. Berg Lus E. Ortz Tamara L. Berg Stony Brook Unversty Stony Brook Unversty, NY 11794, USA {kyamagu, aberg, leortz, tlberg}@cs.stonybrook.edu
More informationHuman Tracking by Fast Mean Shift Mode Seeking
JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 1 Human Trackng by Fast Mean Shft Mode Seekng [10 font sze blank 1] [10 font sze blank 2] C. Belezna Advanced Computer Vson GmbH  ACV, Venna, Austra Emal:
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More information(Almost) No Label No Cry
(Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau
More informationFace Alignment through Subspace Constrained MeanShifts
Face Algnment through Subspace Constraned MeanShfts Jason M. Saragh, Smon Lucey, Jeffrey F. Cohn The Robotcs Insttute, Carnege Mellon Unversty Pttsburgh, PA 15213, USA {jsaragh,slucey,jeffcohn}@cs.cmu.edu
More informationDocumentation for the TIMES Model PART I
Energy Technology Systems Analyss Programme http://www.etsap.org/tools.htm Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General
More informationMean Field Theory for Sigmoid Belief Networks. Abstract
Journal of Artæcal Intellgence Research 4 è1996è 61 76 Submtted 11è95; publshed 3è96 Mean Feld Theory for Sgmod Belef Networks Lawrence K. Saul Tomm Jaakkola Mchael I. Jordan Center for Bologcal and Computatonal
More informationAlgebraic Point Set Surfaces
Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,
More informationSupport vector domain description
Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty
More informationMultiProduct Price Optimization and Competition under the Nested Logit Model with ProductDifferentiated Price Sensitivities
MultProduct Prce Optmzaton and Competton under the Nested Logt Model wth ProductDfferentated Prce Senstvtes Gullermo Gallego Department of Industral Engneerng and Operatons Research, Columba Unversty,
More informationRECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:
Federco Podestà RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY: THE CASE OF POOLED TIME SERIES CROSSSECTION ANALYSIS DSS PAPERS SOC 302 INDICE 1. Advantages and Dsadvantages of Pooled Analyss...
More informationSVO: Fast SemiDirect Monocular Visual Odometry
SVO: Fast SemDrect Monocular Vsual Odometry Chrstan Forster, Mata Pzzol, Davde Scaramuzza Abstract We propose a semdrect monocular vsual odometry algorthm that s precse, robust, and faster than current
More informationMANY of the problems that arise in early vision can be
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,
More informationBRNO UNIVERSITY OF TECHNOLOGY
BRNO UNIVERSITY OF TECHNOLOGY FACULTY OF INFORMATION TECHNOLOGY DEPARTMENT OF INTELLIGENT SYSTEMS ALGORITHMIC AND MATHEMATICAL PRINCIPLES OF AUTOMATIC NUMBER PLATE RECOGNITION SYSTEMS B.SC. THESIS AUTHOR
More informationDP5: A Private Presence Service
DP5: A Prvate Presence Servce Nkta Borsov Unversty of Illnos at UrbanaChampagn, Unted States nkta@llnos.edu George Danezs Unversty College London, Unted Kngdom g.danezs@ucl.ac.uk Ian Goldberg Unversty
More informationEnsembling Neural Networks: Many Could Be Better Than All
Artfcal Intellgence, 22, vol.37, no.2, pp.239263. @Elsever Ensemblng eural etworks: Many Could Be Better Than All ZhHua Zhou*, Janxn Wu, We Tang atonal Laboratory for ovel Software Technology, anng
More informationAsRigidAsPossible Shape Manipulation
AsRgdAsPossble Shape Manpulaton akeo Igarash 1, 3 omer Moscovch John F. Hughes 1 he Unversty of okyo Brown Unversty 3 PRESO, JS Abstract We present an nteractve system that lets a user move and deform
More informationTrueSkill Through Time: Revisiting the History of Chess
TrueSkll Through Tme: Revstng the Hstory of Chess Perre Dangauther INRIA Rhone Alpes Grenoble, France perre.dangauther@mag.fr Ralf Herbrch Mcrosoft Research Ltd. Cambrdge, UK rherb@mcrosoft.com Tom Mnka
More informationComplete Fairness in Secure TwoParty Computation
Complete Farness n Secure TwoParty Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure twoparty computaton, two mutually dstrustng partes wsh to compute
More informationVerification by Equipment or EndUse Metering Protocol
Verfcaton by Equpment or EndUse Meterng Protocol May 2012 Verfcaton by Equpment or EndUse Meterng Protocol Verson 1.0 May 2012 Prepared for Bonnevlle Power Admnstraton Prepared by Research Into Acton,
More informationEnergy Conserving Routing in Wireless Adhoc Networks
Energy Conservng Routng n Wreless Adhoc Networks JaeHwan Chang and Leandros Tassulas Department of Electrcal and Computer Engneerng & Insttute for Systems Research Unversty of Maryland at College ark
More informationA Structure for General and Specc Market Rsk Eckhard Platen 1 and Gerhard Stahl Summary. The paper presents a consstent approach to the modelng of general and specc market rsk as dened n regulatory documents.
More informationDo Firms Maximize? Evidence from Professional Football
Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the
More informationJournal of International Economics
Journal of Internatonal Economcs 79 (009) 31 41 Contents lsts avalable at ScenceDrect Journal of Internatonal Economcs journal homepage: www.elsever.com/locate/je Composton and growth effects of the current
More informationAssessing health efficiency across countries with a twostep and bootstrap analysis *
Assessng health effcency across countres wth a twostep and bootstrap analyss * Antóno Afonso # $ and Mguel St. Aubyn # February 2007 Abstract We estmate a semparametrc model of health producton process
More informationcan basic entrepreneurship transform the economic lives of the poor?
can basc entrepreneurshp transform the economc lves of the poor? Orana Bandera, Robn Burgess, Narayan Das, Selm Gulesc, Imran Rasul, Munsh Sulaman Aprl 2013 Abstract The world s poorest people lack captal
More information