Person Re-identification by Probabilistic Relative Distance Comparison

Person Re-dentfcaton by Probablstc Relatve Dstance Comparson We-Sh Zheng 1,2, Shaogang Gong 2, and Tao Xang 2 1 School of Informaton Scence and Technology, Sun Yat-sen Unversty, Chna 2 School of Electronc Engneerng and Computer Scence, Queen Mary Unversty of London, UK wszheng@eee.org, sgg@eecs.qmul.ac.uk, txang@eecs.qmul.ac.uk Abstract Matchng people across non-overlappng camera vews, known as person re-dentfcaton, s challengng due to the lack of spatal and temporal constrants and large vsual appearance changes caused by varatons n vew angle, lghtng, background clutter and occluson. To address these challenges, most prevous approaches am to extract vsual features that are both dstnctve and stable under appearance changes. However, most vsual features and ther combnatons under realstc condtons are nether stable nor dstnctve thus should not be used ndscrmnately. In ths paper, we propose to formulate person re-dentfcaton as a dstance learnng problem, whch ams to learn the optmal dstance that can maxmses matchng accuracy regardless the choce of representaton. To that end, we ntroduce a novel Probablstc Relatve Dstance Comparson (PRDC) model, whch dffers from most exstng dstance learnng methods n that, rather than mnmsng ntra-class varaton whlst maxmsng ntra-class varaton, t ams to maxmse the probablty of a par of true match havng a smaller dstance than that of a wrong match par. Ths makes our model more tolerant to appearance changes and less susceptble to model over-fttng. Extensve experments are carred out to demonstrate that 1) by formulatng the person re-dentfcaton problem as a dstance learnng problem, notable mprovement on matchng accuracy can be obtaned aganst conventonal person re-dentfcaton technques, whch s partcularly sgnfcant when the tranng sample sze s small; and 2) our PRDC outperforms not only exstng dstance learnng methods but also alternatve learnng methods based on boostng and learnng to rank. 1. Introducton There has been an ncreasng nterest n matchng people across dsjont camera vews n a mult-camera system, known as the person re-dentfcaton problem [10, 7, 14, 8, Most of ths work was done when the frst author was at QMUL. Fgure 1. Typcal examples of appearance changes caused by cross-vew varatons n vew angle, lghtng, background clutter and occluson. Each column shows two mages of the same person from two dfferent camera vews. 3]. For understandng behavour of people n a large area of publc space covered by multple no-overlappng cameras, t s crtcal that when a target dsappears from one vew, he/she can be dentfed n another vew among a crowd of people. Despte the best efforts from computer vson researchers n the past 5 years, the person re-dentfcaton problem remans largely unsolved. Specfcally, n a busy uncontrolled envronment montored by cameras from a dstance, person verfcaton relyng upon bometrcs such as face and gat s nfeasble or unrelable. Wthout accurate temporal and spatal constrants gven the typcally large gaps between camera vews, vsual appearance features alone, extracted manly from clothng, are ntrnscally weak for matchng people (e.g. most people n wnter wear dark clothes). In addton, a person s appearance often undergoes large varatons across dfferent camera vews due to sgnfcant changes n vew angle, lghtng, background clutter and occluson (see Fg. 1), resultng n dfferent people appearng more alke than that of the same person across dfferent camera vews (see Fgs. 4 and 5). Most exstng studes have tred to address the above problems by seekng a more dstnctve and stable feature representaton of people s appearance, rangng wdely from color hstogram [10, 7], graph model [4], spatal cooccurrence representaton model [14], prncpal axs hstogram [8], rectangle regon hstogram [2], to combnatons of multple features [7, 3]. After feature extracton, exstng methods smply choose a standard dstance mea- 649

sure such as l 1 -norm [14], l 2 -norm based dstance [8], or Bahattacharyya dstance [7]. However under severe vewng condton changes that can cause sgnfcant ntra-object appearance varaton (e.g. vew angle, lghtng, occluson), computng a set of features that are both dstnctve and stable under all condton changes s extremely hard f not mpossble under realstc condtons. Moreover, gven that certan features could be more relable than others under a certan condton, applyng a standard dstance measure s undesrable as t essentally treats all features equally wthout dscardng bad features selectvely n each ndvdual matchng crcumstance. In ths paper, we propose to formulate person redentfcaton as a dstance learnng problem whch ams to learn the optmal dstance metrc that can maxmse matchng accuracy regardless the choce of representaton. To that end, a novel Probablstc Relatve Dstance Comparson (PRDC) model s proposed. The objectve functon used by PRDC ams to maxmse the probablty of a par of true match (.e. two true mages of person A) havng a smaller dstance than that of a par of related wrong match (.e. two mages of person A and B respectvely). Ths s n contrast wth that of a conventonal dstance learnng approach, whch ams to mnmse ntra-class varaton n an absolute sense (.e. makng all mages of person A more smlar) whlst maxmsng nter-class varaton (.e. makng all mages of person A and B more dssmlar). Our approach s motvated by the nature of our problem. Specfcally, the person re-dentfcaton problem has three characterstcs: 1) the ntra-class varaton can be large and mportantly can be sgnfcantly vared for dfferent classes as t s caused by dfferent condton changes (see Fg. 1); 2) the nter-class varaton also vares drastcally across dfferent pars of classes; and 3) annotatng matched people across camera vews s tedous and typcally only lmted number of classes (people) are avalable for tranng wth each class contanng only a handful of mages of a person from dfferent camera vews (.e. under-samplng for buldng a representatve class dstrbuton). By explorng a relatve dstance comparson model probablstcally, our model s more tolerant to the large ntra/nter-class varaton and severe overlappng of dfferent classes n a mult-dmensonal feature space. Furthermore, due to the thrd characterstcs of under-samplng, a model could be easly over-ftted f t s learned by mnmsng ntra-class dstance and maxmsng nter-class dstance smultaneously by brutal force. In contrast, our approach s able to learn a dstance wth much reduced complexty thus allevatng the over-fttng problem, as valdated by our extensve experments. Related work Although t has not been exploted for person re-dentfcaton, dstance learnng s a well-studed problem wth a large number of methods reported n the lterature [16, 5, 17, 7, 17, 12, 15, 9, 1]. However, most of them suffer from the over-fttng problem as explaned above. Recently, a few approaches attempt to allevate the problem by ncorporatng the dea of relatve dstance comparson as our PRDC model [12, 15, 9]. However, n these works, the relatve dstance comparson s not quantfed probablstcally, and mportantly s used as an optmsaton constrant rather than objectve functon. Therefore these approaches, ether mplctly [12, 9] or explctly [15] stll am to learn a dstance by whch each class becomes more compact whlst beng more separable from each other n an absolute sense. We demonstrate through experments that they reman susceptble to over-fttng for person redentfcaton. There have been a couple of feature selecton based methods proposed specfcally for person re-dentfcaton [7, 11]. Gray et al. [7] proposed to use boostng to select a subset of optmal features for matchng people. However, n a boostng framework, good features are only selected sequentally and ndependently n the orgnal feature space where dfferent classes can be heavly overlapped. Such selecton may not be globally optmal. Rather than selectng features ndvdually and ndependently (local selecton), we am to learn an optmal dstance measure for all features jontly va dstance learnng (global selecton). An alternatve global selecton approach was developed based on RankSVM [11]. By formulatng the person re-dentfcaton as a rankng problem, the RankSVM approach shares the sprt of relatve comparson n our model. Nevertheless, our approach s more prncpled and tractable than the RankSVM n that 1) PRDC s a second-order feature selecton approach whereas RankSVM s a frst-order one whch s not able to explot correlatons of dfferent features; 2) although RankSVM allevates the over-fttng problem by fusng a rankng error functon wth a large margn functon n ts objectve functon, the probablstc formulaton of our objectve functon makes PRDC more tolerant to large ntra- and nter-class varatons and data sparsty; 3) tunng the crtcal free parameter of RankSVM that determnes the weght between the margn functon and the rankng error functon s computatonally costly and can be sub-optmal gven lmted data. In contrast, our PRDC model does not such a problem. We demonstrate the advantage of our approach over both the Boostng [7] and RankSVM [11] based methods through experments. The man contrbutons of ths work are two-fold. 1) We formulate the person re-dentfcaton problem as a dstance learnng problem, whch leads to noteworthy mprovement on re-dentfcaton accuracy. To the best of our knowledge, t has not been nvestgated before. 2) We propose a probablstc relatve dstance comparson based method that overcomes the lmtatons of exstng dstance learnng methods when appled to person re-dentfcaton. 650

2. Probablstc Relatve Dstance Comparson for Person Re-dentfcaton Let us formally cast the person re-dentfcaton problem nto the followng dstance learnng problem. For an mage z of person A, we wsh to learn a re-dentfcaton model to successfully dentfy another mage z of the same person captured elsewhere n space and tme. Ths s acheved by learnng a dstance functon f(, ) so that f(z, z ) < f(z, z ), where z s an mage of any other person except A. To that end, gven a tranng set { (z, y ) }, where z Z s a mult-dmensonal feature vector representng the appearance of a person n one vew and y s ts class label (person ID), we defne a parwse set O = {O = (x p, xn )}, where each element of a par-wse data O tself s computed usng a par of sample feature vectors. More specfcally, x p s a dfference vector computed between a par of relevant samples (of the same class/person) and x n s a dfference vector from a par of related rrelevant samples,.e. only one sample for computng x n s one of the two relevant samples for computng x p and the other s a ms-match from another class. The dfference vector x between any two samples z and z s computed by x = d(z, z ), z, z Z (1) where d s an entry-wse dfference functon that outputs a dfference vector between z and z. The specfc form of functon d wll be descrbed n Sec. 2.3. Gven the parwse set O, a dstance functon f can be learned based on relatve dstance comparson so that a dstance between a relevant sample par (f(x p )) s smaller than that between a related rrelevant par (f(x n )). That s f(x p ) < f(xn ) for each par-wse data O. To ths end, we measure the probablty of the dstance between a relevant par beng smaller than that of a related rrelevant par as: P (f(x p ) < f(xn )) = ( 1+exp { f(x p ) f(xn ) }) 1. (2) We assume the events of dstance comparson between a relevant par and an rrelevant par,.e. f(x p ) < f(xn ), are ndependent 1. Then, based on the maxmum lkelhood prncple, the optmal functon f can be learned as follows: f = arg mn r(f, O), f r(f, O) = log( P (f(x p O ) < (3) f(xn ))). The dstance functon f s parametersed as a Mahanalobs (quadratc) based dstance functon: f(x) = x T Mx, M 0 (4) where M s a semdefnte matrx. The dstance learnng problem thus becomes learnng M usng Eqn. (3). Drectly learnng M usng semdefnte program technques s computatonally expensve for hgh dmensonal data [15]. In partcular, we found out n our experments that gven a d- 1 Note that we do not assume the data are ndependent. mensonalty of thousands, typcal for vsual object representaton, a dstance learnng method based on learnng M becomes ntractable. To overcome ths problem, we perform egenvalue decomposton on M: M = AΛA T = WW T, W = AΛ 1 2, (5) where the columns of A are orthonormal egenvectors of M and the dagonals of Λ are the correspondng egenvalues. Note that W s orthogonal. Therefore, learnng a functon f s equvalent to learnng an orthogonal matrx W = (w 1,, w l,, w L ) such that W = arg mn r(w, O), s.t. W wt w j = 0, j r(w, O) = O log(1 + exp { W T x p 2 W T x n 2} ). 2.1. An Iteratve Optmsaton Algorthm It s mportant to pont out that our optmsaton crteron (6) may not be a convex optmsaton problem aganst the orthogonal constrant due to the relatve comparson modellng. It means that dervng an global soluton by drectly optmsng W s not straghtforward. In ths work we formulate an teratve optmsaton algorthm to learn an optmal W, whch also ams to seek a low rank (non-trval) soluton automatcally. Ths s crtcal for reducng the model complexty thus overcomng the overfttng problem gven sparse data. Startng from an empty matrx, after teraton l, a new estmated column w l s added to W. The algorthm termnates after L teratons when a stoppng crteron s met. Each teraton conssts of two steps as follows: Step 1. Assume that after l teratons, a total of l orthogonal vectors w 1,, w l have been learned. To learn the next orthogonal vector w l+1, let a l+1 l = exp{ j=0 wt j x p,j (6) 2 w T j x n,j 2 }, (7) are the dffer- where we defne w 0 = 0, and x p,l and x n,l ence vectors at the l-th teraton defned as follows: x s,l = x s,l 1 w l 1 w l 1x T s,l 1, s {p, n}, = 1,, O, l 1, where w l 1 = w l 1 / w l 1. Note that we defne x s,0 = x s, s {p, n}, and w 0 = 0. Step 2. Obtan x p,l+1, x n,l+1 by Eqn. (8). Let O l+1 = {O l+1 =(x p,l+1, x n,l+1 )}. Then, learn a new optmal projecton w l+1 on O l+1 as follows: r l+1 (w, O l+1 ) (8) w l+1 = arg mn w r l+1(w, O l+1 ), where (9) = log(1 + a l+1 O l+1 exp { w T x p,l+1 2 w T x n,l+1 2} ). We seek an optmal soluton by a gradent descent method: w l+1 w l+1 λ r l+1 w l+1, λ 0, (10) 651

r l+1 = w l+1 O l+1 2 a l+1 1 + a l+1 exp { wl+1 T xp,l+1 2 wl+1 T xn,l+1 2} exp { wl+1 T xp,l+1 2 wl+1 T xn,l+1 2} ( x p,l+1 x p,l+1 T x n,l+1 x n,l+1 T ) wl+1. where λ s a step length automatcally determned at each gradent update step. Accordng to the descent drecton n Eqn. (10) the ntal value of w l+1 for the gradent descent method s set to w l+1 = O l+1 1 (x n,l+1 O l+1 x p,l+1 ). (11) Note that the update n Eqn. (8) deducts nformaton from each sample x s,l 1 affected by w l 1 as wl 1 T xs,l = 0, so that the next learned vector w l wll only quantfy the part of the data left from the last step,.e. x s,l. In addton, ndcates the trends n the change of dstance measures a l+1 for x p and xn over prevous teratons and serve as a pror weght for learnng w l. The teraton of the algorthm (for l > 1) s termnated when the followng crteron s met: r l (w l, O l ) r l+1 (w l+1, O l+1 ) < ε. (12) where ε s a small tolerance value set to 10 6 n ths work. The algorthm s summarsed n Algorthm 1. Algorthm 1: Learnng the PRDC model Data: O = {O = (x p, xn )}, ε > 0 begn w 0 0, w 0 0; x s,0 x s, s {p, n}, O 0 O; l 0; whle 1 do Compute a l+1 by Eqn. (7); Compute x s,l+1, s {p, n} by Eqn. (8); O l+1 {O l+1 = (x p,l+1, x n,l+1 )}; Estmate w l+1 usng Eqn. (9); w l+1 = w l+1 ; w l+1 f (l > 1)&(r l (w l, O l ) r l+1 (w l+1, O l+1 ) < ε) then break; end l l + 1; end end Output: W = [ ] w 1,, w l 2.2. Theoretcal Valdaton The followng two theorems valdate that the proposed teratve optmsaton algorthm learns a set of orthogonal projectons {w l } that teratvely decrease the objectve functon n Crteron (6). Theorem 1. The learned vectors w l, l = 1,, L, are orthogonal to each other. Proof. Assume that l 1 orthogonal vectors {w j } l 1 j=1 have been learned. Let w l be the optmal soluton of Crteron (9) at the l teraton. Frst, we know that w l s n the range space 2 of {x p,l } {x n,l } accordng to Eqns. (10) and (11),.e. w l span{x s,l, = 1,, O, s {p, n}}. Second, accordng to Eqn. (8), we have w T j x s,j+1 = 0, s {p, n}, j = 1,, l 1 span{x s,l, = 1,, O, s {p, n}} span{x s,l 1, = 1,, O, s {p, n}} span{x s,0, = 1,, O, s {p, n}} Hence, w l s orthogonal to w j, j = 1,, l 1. (13) Theorem 2. r(w l+1, O l+1 ) r(w l, O l ), where W l = (w 1,, w l ), l 1. That s, the algorthm teratvely decreases the objectve functon value. Proof. Let w l+1 be the optmal soluton of Eqn. (9). By Theorem 1, t s easy to prove that for any j 1, wj T x s,j = wj T x s,0 = wj T x s, s {p, n}. Hence we have r l+1 (w l+1, O l+1 ) = O l+1 log(1 + a l+1 = r(w l+1, O) exp { wl+1 T xp,l+1 2 wl+1 T xn,l+1 2} ) Also r l+1 (0, O l+1 ) = r(w l, O). Snce w l+1 s the mnmal soluton, we have r l+1 (w l+1, O l+1 ) r l+1 (0, O l+1 ), and therefore r(w l+1, O) r(w l, O). Snce Crteron (9) may not be convex, a local optmum could be obtaned n each teraton of our algorthm. However, even f the computaton was trapped n a local mnmum of Eqn. (9) at the l + 1 teraton, Theorem 2 s stll vald f r l+1 (w l+1, O l+1 ) r l (w l, O l ), otherwse the algorthm wll be termnated by the stoppng crteron (12). To allevate the local optmum problem at each teraton, multple ntalsatons could also be deployed n practce. 2.3. Learnng n an Absolute Data Dfference Space To compute the data dfference vector x defned n Eqn. (1), most exstng dstance learnng methods use the followng entry-wse dfference functon x = d(z, z ) = z z (14) to learn M = WW T n the normal data dfference space denoted by DZ = { x j = z z j z, z j Z }. The learned dstance functon s thus wrtten as: f(x j ) = (z z j ) T M(z z j ) = W T x j 2. (15) In ths work, we compute the dfference vector by the followng entry-wse absolute dfference functon: x = d(z, z ) = z z, x(k) = z(k) z (k). (16) 2 It can be explored by Lagrangan equaton for Eqn. (9) for a non-zero wl. 652

where z(k) s the k-th element of the sample feature vector. M s thus learned n an absolute data dfference space, denoted by { DZ = xj = z z j z, z j Z }, and our dstance functon becomes: f( x j ) = z z j T M z z j = W T x j 2. (17) We now explan why learnng n an absolute data dfference space s more sutable to our relatve comparson model. Frst, we note that: z (k) z j (k) (z (k) z j (k) (18) (z (k) z j (k)) (z (k) z j (k)), hence we have x j x j. x j x j, where. s an entry-wse. As x j, x j 0, we thus can prove xj x j xj x j. (19) Ths suggests that the varaton of x j gven the same sample space Z s always less than that of x j. Specfcally, f z, z j, z j are from the same class, the ntra-class varaton s smaller n DZ than n DZ. On the other hand, f z j and z j belong to a dfferent class as z, the varaton of nter-class dfferences s also more compact n the absolute data dfference space. Snce the varatons of both relevant and rrelevant sample dfferences x p and x n are smaller, the learned dstance functon usng Eqn. (6) would yeld more consstent dstance comparson results therefore benefttng our PRDC model. Specally, for the same semdefnte matrx M, the Cauchy nequalty suggests upper( W T ( x j x j ) ) upper( W T (x j x j ) ), where upper( ) s the upper bound operaton. Ths ndcates that n the latent subspace nduced by W, the maxmum varaton of x j T M x j s lower than that of x T j Mx j. We show notable beneft of learnng PRDC n an absolute data dfference space n our experments. 2.4. Feature Representaton Our PRDC model can be appled regardless of the choce of appearance feature representaton of people. However, n order to beneft from dfferent and complementary nformaton captured by dfferent features, we start wth a mxture of colour and texture hstogram features smlar to those used n [7] and let our model automatcally dscover an optmal feature dstance. Specfcally, we dvded a person mage nto sx horzontal strpes. For each strpe, the RGB, YCbCr, HSV color features and two types of texture features extracted by Schmd and Gabor flters were computed and represented as hstograms. In total 29 feature channels were constructed for each strpe and each feature channel was represented by a 16 dmensonal hstogram vector. Each person mage was thus represented by a feature vector n a 2784 dmensonal feature space Z. Snce the features computed for ths representaton nclude low-level features wdely used by exstng person re-dentfcaton technques, ths representaton s consdered as generc and representatve. 3. Experments Datasets and settngs. Two publcally avalable person re-dentfcaton datasets, -LIDS Multple-Camera Trackng Scenaro (MCTS) [18, 13] and VIPeR [6], were used for evaluaton. In the -LIDS MCTS dataset, whch was captured ndoor at a busy arport arrval hall, there are 119 people wth a total 476 person mages captured by multple non-overlappng cameras wth an average of 4 mages for each person. Many of these mages undergo large llumnaton change and are subject to occlusons (see Fg. 4). The VIPeR dataset s the largest person re-dentfcaton dataset avalable consstng of 632 people captured outdoor wth two mages for each person. Vewpont change s the most sgnfcant cause of appearance change wth most of the matched mage pars contanng one front/back vew and one sde-vew (see Fg. 5). In our experments, for each dataset, we randomly selected all mages of p people (classes) to set up the test set, and the rest were used for tranng. Each test set was composed of a gallery set and a probe set. The gallery set conssted of one mage for each person, and the remanng mages were used as the probe set. Ths procedure was repeated 10 tmes. Durng tranng, a par of mages of each person formed a relevant par, and one mage of hm/her and one of another person n the tranng set formed a related rrelevant par, and together they form the parwse set O defned n Sec. 2. For evaluaton, we use the average cumulatve match characterstc (CMC) curves [6] over 10 trals to show the ranked matchng rates. A rank r matchng rate ndcates the percentage of the probe mages wth correct matches found n the top r ranks aganst the p gallery mages. Rank 1 matchng rate s thus the correct matchng/recognton rate. Note that n practce, although a hgh rank 1 matchng rate s crtcal, the top r ranked matchng rate wth a small r value s also mportant because the top matched mages wll normally be verfed by a human operator [6]. PRDC vs. Non-Learnng based Dstances. We frst compared our PRDC wth non-learnng based l 1 -norm dstance and Bhattacharyya dstance whch were used by most exstng person re-dentfcaton work. Our results (Fgs. 2 and 3, Tables 1 and 2) show clearly that wth the proposed PRDC, the matchng performance for both datasets s mproved notably, more so when the number of people n the test pool ncreases (.e. tranng set sze decreases). The mprovement s partcularly dramatc on the VIPeR dataset. In partcular, Table 2 shows that a 4-fold ncrease n correct matchng rate (r = 1) s obtaned aganst both l 1 -norm and Bhattacharyya dstances when p = 316. The results valdate the mportance of performng dstance learnng. Examples of matchng people usng PRDC for both datasets are shown n Fgs. 4 and 5 respectvely. 653

Matchng Rate (%) 95 85 LIDS 75 PRDC 65 Adaboost 55 ITM LMNN 45 MCC 35 Xng s L1 Norm 25 1 5 10 15 20 25 30 Rank Score (a) p = 50 Matchng Rate (%) 90 80 70 60 50 40 30 20 LIDS 10 1 5 10 15 20 25 30 Rank Score (b) p = 80 Fgure 2. Performance comparson usng CMC curves on -LIDS MCTS dataset. Methods p = 30 p = 50 p = 80 r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 PRDC 44.05 72.74 84.69 96.29 37.83 63.70 75.09 88.35 32.60 54.55 65.89 78.30 Adaboost 35.58 66.43 79.88 93.22 29.62 55.15 68.14 82.35 22.79 44.41 57.16 70.55 LMNN 33.68 63.88 78.17 92.64 27.97 53.75 66.14 82.33 23.70 45.42 57.32 70.92 ITM 36.37 67.99 83.11 95.55 28.96 53.99 70.50 86.67 21.67 41.80 55.12 71.31 MCC 40.24 73.64 85.87 96.65 31.28 59.30 75.62 88.34 12.00 33.66 47.96 67.00 Xng s 31.80 62.62 77.29 90.63 27.04 52.28 65.35 80.70 23.18 45.24 56.90 70.46 L1-norm 35.31 64.62 77.37 91.35 30.72 54.95 67.99 82.98 26.73 49.04 60.32 72.07 Bhat. 31.77 61.43 74.19 89.53 28.42 51.06 64.32 78.77 24.76 45.35 56.12 69.31 Table 1. Top ranked matchng rate (%) on -LIDS MCTS. p s sze of the gallery set (larger p means smaller tranng set) and r s the rank. PRDC vs. Alternatve Learnng Methods. We also compared PRDC wth 5 alternatve dscrmnant learnng based approaches. These nclude 4 popular dstance learnng methods, namely Xng s method [16], LMNN [15], ITM [1] and MCC [5], and a method specfcally desgned for person re-dentfcaton based on Adaboost [7]. Among the 4 dstance learnng methods, only LMNN explots relatve dstance comparson. But as mentoned n Sec. 1, t s used as an optmsaton constrant rather than the man objectve functon whch s also not formulated probablstcally. MCC s smlar to PRDC n that a probablstc model s used but t s not a relatve dstance comparson based method. Note that snce MCC needs to select the best dmenson for matchng, we performed cross-valdaton by selectng ts value n {[1 : 1 : 10], d}, where d s the maxmum rank MCC can learn. Among the 5, the only method that learns n an absolute data dfferent space s Adaboost. Our results (Fgs. 2 and 3, Tables 1 and 2) show clearly that our model yelds the best rank 1 matchng rate and overall much superor performance compared to the compared models. The advantage of PRDC s partcularly apparent when a tranng set s small (learnng becomes more dffcult) and a test set s large ndcated by the value of p (matchng becomes harder). Table 2 shows that on VIPeR when 100 people are used for learnng and 532 people for testng (p = 532), the correct matchng rate for PRDC (and MCC) s almost more than doubled aganst any alternatve dstance learnng methods. Partcularly, beneftng from beng a probablstc model, MCC gves the most comparable results to PRDC when the tranng set s large. However, ts performance degrades dramatcally when the sze of tranng data decreases (see columns under p = 80 n Table 1 and p = 532 n Table 2). Ths suggests that over-fttng to lmted tranng data s the man reason for the nferor performance of the compared alternatve learnng approaches. PRDC vs. RankSVM. Dfferent from PRDC, RankSVM has a free parameter whch determnes the relatve weghts between the margn functon and the rankng error functon [11]. In our experment, we cross-valdated the parameter n {0.0001, 0.005, 0.001, 0.05, 0.1, 0.5, 1, 10, 100, 1000}. As shown n Tables 3 and 4, the two methods all perform very well aganst other compared algorthms and our PRDC yelds overall better performance especally at lower rank matchng rate and gven less tranng data. The better performance of PRDC s due to the probablstc modellng and a second-order rather than frst-order feature selecton. It s also noted that tunng the free parameter for RankSVM s not a trval task and the performance can be senstve to the tunng especally gven sparse data, whle PRDC does not have ths problem. In addton RankSVM s computatonally more expensve (see detals later). Effect of learnng n an Absolute Data Dfference Space. We have shown n Sec. 2.3 that n theory our relatve dstance comparson learnng method can beneft from learnng n an absolute data dfference space. To valdate ths expermentally, we compare PRDC wth PRDC raw whch learns n the normal data dfference space DZ (see Sec. 2.3). The result n Table 5 ndcates that learnng n an absolute data dfference space does mprove the matchng performance. Note that most exstng dstance learnng models are based on learnng n the normal data dfference space DZ. It s possble to reformulate some of them n 654

Matchng Rate (%) 100 90 80 VIPeR 70 60 PRDC Adaboost 50 ITM 40 LMNN 30 MCC 20 Xng s 10 L1 Norm 0 1 20 40 60 80 100 Rank Score (a) p = 316 Matchng Rate (%) 90 80 70 60 50 40 30 20 10 VIPeR 0 1 20 40 60 80 100 Rank Score (b) p = 532 Fgure 3. Performance comparson usng CMC curves on VIPeR dataset. Methods p = 316 p = 432 p = 532 r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 PRDC 15.66 38.42 53.86 70.09 12.64 31.97 44.28 59.95 9.12 24.19 34.40 48.55 Adaboost 8.16 24.15 36.58 52.12 6.83 19.81 29.75 43.06 4.19 12.95 20.21 30.73 LMNN 6.23 19.65 32.63 52.25 5.14 13.13 20.30 33.91 4.04 9.68 14.19 21.18 ITM 11.61 31.39 45.76 63.86 8.38 24.54 36.81 52.29 4.19 11.11 17.22 24.59 MCC 15.19 41.77 57.59 73.39 11.30 32.43 47.29 62.85 5.00 16.32 25.92 39.64 Xng s 4.65 11.96 16.61 24.37 4.12 10.02 14.70 20.65 3.63 8.76 12.14 18.16 L1-norm 4.18 11.65 16.52 22.37 3.80 9.81 13.94 19.44 3.55 8.29 12.27 17.59 Bhat. 4.65 11.49 16.55 23.83 4.19 10.35 14.19 20.19 3.82 9.08 12.42 17.88 Table 2. Top ranked matchng rate (%) on VIPeR. p s the number of classes n the testng set; r s the rank. Rank PRDC RankSVM r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 p = 30 44.05 72.74 84.69 96.29 42.96 71.30 85.15 96.99 p = 50 37.83 63.70 75.09 88.35 37.41 63.02 73.50 88.30 p = 80 32.60 54.55 65.89 78.30 31.73 55.69 67.02 77.78 Table 3. PRDC vs. RankSVM (%) on -LIDS. Rank PRDC RankSVM r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 p = 316 15.66 38.42 53.86 70.09 16.27 38.23 53.73 69.87 p = 432 12.64 31.97 44.28 59.95 10.63 29.70 42.31 58.26 p = 532 9.12 24.19 34.40 48.55 8.87 22.88 32.69 45.98 Table 4. PRDC vs. RankSVM (%) on VIPeR. Methods -LIDS, (p = 50) VIPeR (p = 316) r = 1 r = 5 r = 10 r = 20 r = 1 r = 5 r = 10 r = 20 PRDC 37.83 63.70 75.09 88.35 15.66 38.42 53.86 70.09 PRDC raw 19.92 50.19 68.29 86.40 12.28 37.28 53.83 71.77 ITM abs 29.16 53.01 66.75 82.53 5.44 14.43 22.53 33.35 MCC abs 5.59 23.01 43.59 70.47 1.20 3.51 5.6 9.68 Table 5. Effect of learnng n an absolute data dfference space. Methods -LIDS MCTS VIPeR p = 30 p = 50 p = 80 p = 316 p = 432 p = 532 rank(w) 3.2 2.4 2.3 2.9 3.2 3.7 Table 6. Average Rank of W Learned by PRDC. order to learn n an absolute data dfference space. In Table 5 we show that when ITM and MCC are learned n the absolute data dfference space DZ, termed as ITM abs and MCC abs respectvely, ther performances become worse as compared to ther results n Tables 1 and 2. Ths ndcates that the absolute dfferent space s more sutable for our relatve comparson dstance learnng. Computatonal cost. Though PRDC s teratve, t has relatvely low cost n practce. In our experments, for VIPeR wth p = 316, t took around 15 mnutes for an Intel dualcore 2.93GHz CPU and 48GB RAM to learn PRDC for each tral. We observed that the low cost of PRDC s partally due to ts ablty to seek a sutable low rank of W (.e. converge wthn very few teratons) as shown n Table 6. For comparson, among the other compared methods, Adaboost s the most costly and took over 7 hours for each tral. For the 4 compared dstance learnng methods, PCA dmensonalty reducton must be performed otherwse they becomes ntractable gven the hgh dmensonal feature space. For the RankSVM method, each tral took around 2.5 hours due to parameter tunng. 4. Concluson We have proposed a new approach for person redentfcaton based on probablstc relatve dstance comparson whch ams to learn an sutable optmal dstance measure gven large ntra and nter-class appearance varatons and sparse data. Our experments demonstrate that 1) by formulatng person re-dentfcaton as a dstance learnng problem, clear mprovement n matchng performance can be obtaned and the mprovement s more sgnfcant when tranng sample sze s small, and (2) our PRDC outperforms not only exstng dstance learnng methods but also alternatve learnng methods based on boostng and learnng to rank. Acknowledgements Ths research was partally funded by the EU FP7 project SAMURAI wth grant no. 217899. Dr. We-Sh Zheng was also addtonally supported by the 985 project n Sun Yatsen Unversty wth grant no. 35000-3181305. References [1] J. Davs, B. Kuls, P. Jan, S. Sra, and I. Dhllon. Informatontheoretc metrc learnng. In ICML, 2007. 655

Fgure 4. Examples of Person Re-dentfcaton on -LIDS MCTS usng PRDC. In each row, the left-most mage s the probe, mages n the mddle are the top 20 matched gallery mages wth a hghlghted red box for the correctly matched, and the rght-most shows a true match Fgure 5. Examples of Person Re-dentfcaton on VIPeR usng PRDC [2] P. Dollar, Z. Tu, H. Tao, and S. Belonge. Feature mnng for mage classfcaton. In CVPR, 2007. [3] M. Farenzena, L. Bazzan, A. Perna, M. Crstan, and V. Murno. Person re-dentfcaton by symmetry-drven accumulaton of local features. In CVPR, 2010. [4] N. Ghessar, T. Sebastan, and R. Hartley. Person redentfcaton usng spatotemporal appearance. In CVPR, 2006. [5] A. Globerson and S. Rowes. Metrc learnng by collapsng classes. In NIPS, 2005. [6] D. Gray, S. Brennan, and H. Tao. Evaluatng appearance models for recognton, reacquston, and trackng. In IEEE Internatonal workshop on performance evaluaton of trackng and survellance, 2007. [7] D. Gray and H. Tao. Vewpont nvarant pedestran recognton wth an ensemble of localzed features. In ECCV, 2008. [8] W. Hu, M. Hu, X. Zhou, J. Lou, T. Tan, and S. Maybank. Prncpal axs-based correspondence between multple cameras for people trackng. PAMI, 28(4):663 671, 2006. [9] J. Lee, R. Jn, and A. Jan. Rank-based dstance metrc learnng: An applcaton to mage retreval. In CVPR, 2008. [10] U. Park, A. Jan, I. Ktahara, K. Kogure, and N. Hagta. Vse: Vsual search engne usng multple networked cameras. In ICPR, 2006. [11] B. Prosser, W.-S. Zheng, S. Gong, and T. Xang. Person redentfcaton by support vector rankng. In BMVC, 2010. [12] M. Schultz and T. Joachms. Learnng a dstance metrc from relatve comparsons. In NIPS, 2004. [13] UK. Home Offce -LIDS multple camera trackng scenaro defnton. 2008. [14] X. Wang, G. Doretto, T. Sebastan, J. Rttscher, and P. Tu. Shape and appearance context modelng. In ICCV, 2007. [15] K. Wenberger, J. Bltzer, and L. Saul. Dstance metrc learnng for large margn nearest neghbor classfcaton. In NIPS, 2006. [16] E. Xng, A. Ng, M. Jordan, and S. Russell. Dstance metrc learnng, wth applcaton to clusterng wth sdenformaton. In NIPS, 2003. [17] L. Yang, R. Jn, R. Sukthankar, and Y. Lu. An effcent algorthm for local dstance metrc learnng. In AAAI, 2006. [18] W.-S. Zheng, S. Gong, and T. Xang. Assocatng groups of people. In BMVC, 2009. 656