Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch



Similar documents
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

L10: Linear discriminants analysis

Forecasting the Direction and Strength of Stock Market Movement

What is Candidate Sampling

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Support Vector Machines

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

This circuit than can be reduced to a planar circuit

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

BERNSTEIN POLYNOMIALS

How To Calculate The Accountng Perod Of Nequalty

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Learning from Multiple Outlooks

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Single and multiple stage classifiers implementing logistic discrimination

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

An Alternative Way to Measure Private Equity Performance

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Gender Classification for Real-Time Audience Analysis System

Ring structure of splines on triangulations

A Fast Incremental Spectral Clustering for Large Data Sets

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

An Empirical Study of Search Engine Advertising Effectiveness

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Realistic Image Synthesis

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Logistic Regression. Steve Kroon

Mining Multiple Large Data Sources

Matching Images with Different Resolutions

Adaptive Fractal Image Coding in the Frequency Domain

ONE of the most crucial problems that every image

1 Example 1: Axis-aligned rectangles

Recurrence. 1 Definitions and main statements

A Multi-mode Image Tracking System Based on Distributed Fusion

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

A Secure Password-Authenticated Key Agreement Using Smart Cards

DEFINING %COMPLETE IN MICROSOFT PROJECT

Loop Parallelization

8 Algorithm for Binary Searching in Trees

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Performance Analysis and Coding Strategy of ECOC SVMs

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

Lecture 5,6 Linear Methods for Classification. Summary

Generalizing the degree sequence problem

An interactive system for structure-based ASCII art creation

Statistical Approach for Offline Handwritten Signature Verification

PERRON FROBENIUS THEOREM

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Detecting Global Motion Patterns in Complex Videos

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Analysis of Premium Liabilities for Australian Lines of Business

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Quantization Effects in Digital Filters

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

HÜCKEL MOLECULAR ORBITAL THEORY

MACHINE VISION SYSTEM FOR SPECULAR SURFACE INSPECTION: USE OF SIMULATION PROCESS AS A TOOL FOR DESIGN AND OPTIMIZATION

Data Visualization by Pairwise Distortion Minimization

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

where the coordinates are related to those in the old frame as follows.

An Algorithm for Data-Driven Bandwidth Selection

Implementation of Deutsch's Algorithm Using Mathcad

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

Title Language Model for Information Retrieval

A Performance Analysis of View Maintenance Techniques for Data Warehouses

J. Parallel Distrib. Comput.

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

1. Measuring association using correlation and regression

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Comparison of Control Strategies for Shunt Active Power Filter under Different Load Conditions

Person Re-identification by Probabilistic Relative Distance Comparison

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Lecture 2: Single Layer Perceptrons Kevin Swingler

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Calculation of Sampling Weights

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Transcription:

Bypassng Synthess: PLS for Face Recognton wth Pose, Low-Resoluton and Setch Abhshe Sharma Insttute of Advanced Computer Scence Unversty of Maryland, USA bhoaal@umacs.umd.edu Davd W Jacobs Insttute of Advanced Computer Scence Unversty of Maryland, USA djacobs@cs.umd.edu Abstract hs paper presents a novel way to perform mult-modal face recognton. We use Partal Least Squares (PLS) to lnearly map mages n dfferent modaltes to a common lnear subspace n whch they are hghly correlated. PLS has been prevously used effectvely for feature selecton n face recognton. We show both theoretcally and expermentally that PLS can be used effectvely across modaltes. We also formulate a generc ntermedate subspace comparson framewor for mult-modal recognton. Surprsngly, we acheve hgh performance usng only pxel ntenstes as features. We expermentally demonstrate the hghest publshed recognton rates on the pose varatons n the PIE data set, and also show that PLS can be used to compare setches to photos, and to compare mages taen at dfferent resolutons.. Introducton In face recognton, one often sees to compare gallery mages taen under one set of condtons, to a probe mage acqured dfferently. For example, n crmnal nvestgatons, we mght need to compare mugshots to a setch drawn by a setch artst based on the verbal descrpton of the suspect. Smlarly, mug-shots or passport photos mght be compared to survellance mages taen from a dfferent vewpont. he probe mage mght also be of lower resoluton (LR) compared to a gallery of hgh resoluton (HR) mages. We propose a general framewor that uses Partal Least Squares (PLS) [6] to perform recognton n a wde range of mult-modal scenaros. PLS has been used very effectvely for face recognton, but n a dfferent manner, wth dfferent motvaton [7, 9, 0,, ]; our contrbuton s to show how and why PLS can be used for cross-modal recognton. More generally, we argue for the applcablty of lnear projecton to an ntermedate subspace for mult-modal recognton, also pontng out the value of the Blnear Model (BLM) [4] for face recognton, whch also acheves state-of-the art results on some problems. Expermental evaluaton of our framewor HR Pose Photo Intermedate Correlated Projectons Space X W X Fgure : he basc over-vew of the proposed method, W X and W Y are projecton matrces learned usng PLS on X and Y. usng PLS wth pose varaton has shown sgnfcant mprovements n terms of accuracy and run-tme over the state-of-art on the CMU PIE face data set [6]. For setchphoto recognton, our method s comparable to the state of-art. We also llustrate the potental of our method to handle varaton n resoluton wth a smple, synthetc example. In all three domans we apply exactly the same algorthm, and use the same, smple representaton of mages. Our generc approach performs ether near or better than state-of-the-art approaches that have been desgned for specfc cross-modal condtons. Our approach matches probe and gallery mages by lnearly projectng them nto an ntermedate space where mages wth the same dentty are hghly correlated (Fgure ). We argue that for a varety of cross-modalty recognton problems, such projectons wll exst and can be found usng PLS and BLM. One consequence of our approach s that we do not need to synthesze an artfcal gallery mage from the probe mage... Related Wor Lnear Projecton Space Y W Y Nearest Neghbor Matchng LR Pose Setch here has been a huge amount of pror wor on comparng mages taen n dfferent modaltes, whch we 593

can only sample here. In much of ths wor, mages taen n one modalty are automatcally converted to the second modalty pror to comparson. For example a holstc mappng [] s used to convert a photo mage nto a correspondng setch mage. In [, 3, 5] the authors have used local patch based mappngs to convert mages from one modalty to the other for setch-photo recognton. Snce the mappng from one modalty to the other s generally non-lnear, local patch based approaches generally perform better than the global ones because they can approxmate the non-lnearty n a better manner. [7] s a holstc and [6, 8, 9] are local patch-wse approaches to hallucnate a HR face mage from a gven LR face mage and agan a comparson reveals that local approaches performed better. For face recognton wth pose and lghtng varaton [0, 3, 3] 3D nowledge of faces s used to warp an off-axs mage to a frontal mage, and to normalze lghtng pror to comparson. hese approaches may use representatons that are specfc to a doman, or may employ a more general, learnng-based approach, that typcally requres correspondng patches n the tranng set [3, 0, 8, 3]. Our approach does not attempt to synthesze mages of one modalty from another. Whle excellent wor has been done on synthess, ths may n prncple be an ll-posed problem that s more dffcult than smply comparng mages taen n two dfferent modaltes. A second approach s to compare mages usng a representaton that s nsenstve to changes n modalty. For example, Klare et al [4] used SIF feature descrptors and mult-scale local bnary patterns to represent grayscale and setch mages of faces then performed recognton based on ths common representaton. hs approach wored well because both SIF and LBP features extract gradent nformaton that s approxmately the same n both photo and setch at correspondng postons. Whle some descrptors, such as SIF, are robust across a range of varatons n modaltes, no sngle representaton can be expected to handle all varatons n modalty. wo pror methods are closer to our wor n sprt, and have provded valuable nspraton. In [4] (BLM) the authors have used Sngular Value Decomposton to derve a common content space for a set of dfferent styles and [] uses a probablstc model to generate coupled subspaces for dfferent poses. We dscuss [4] further n the next secton to provde motvaton for our use of PLS, and we also compare expermentally to ther representaton. Recently, [4] used CCA to project mages n dfferent poses to a common subspace and compared them usng probablstc modelng. Whle related our approach s dfferent n several ways: we acheve strong results usng smple pxel ntenstes, wthout probablstc modelng of patches; we show theoretcally why projecton methods can handle pose varaton; and we show that PLS can outperform CCA wth pose varaton.. Blnear Model annenbaum and Freeman [4] proposed a blnear model of style and content. In cross-modal face recognton, the two modes correspond to two styles, and subject dentty corresponds to content. hey suggest methods of learnng BLMs and usng them for a varety of tass, such as dentfyng the style of a new mage wth unfamlar content, or generatng novel mages based on separate examples of the style and content. However, ther approach also suggests that ther content-style models can be used to obtan style nvarant content representaton whch can be used for classfcaton of a sample n a dfferent style. Followng ther asymmetrc model, modalty matrces A m can be learned by decomposng the matrx Y (whch s a matrx n whch the same subject s mages under dfferent modalty are concatenated to mae a long vector) usng SVD as (see [4]): Y USV ( US)V ( A)B () A can be parttoned to gve dfferent modalty models (A m and A m ) for our case m and m mght represent two dfferent poses or setch and photo and so on. We now that matrx U has the egenvectors of YY as ts columns; denote the th egenvector and assocated egenvalue as λ and u respectvely. So, a u YY )u Y( Y u ) Y( α ) () ( α s a column vector wth each element equal to the projecton (nner product) of tranng mages on egenvectors u and a s the th column of matrx A: y u (3) Hence, each egenvector u and vector a can be defned as a lnear combnaton of tranng mages y. o get the models for dfferent modaltes we need to partton the m m vectors a to yeld a & a so from eqn () we get: m m a Y α (4-a) m m a Y α (4-b) where, Y m and Y m are the matrces wth mages under modaltes m and m as ther columns. Now let s project a subject s face mages under two dfferent modaltes m and m denoted as f m and f m m m, on a & a to get the projecton coeffcents for j =, as: ( a ) f K ( Y α ) f α (( Y α ( γ ) (5) Here, K s the total number of subjects used n the tranng set to learn matrx A. Each element of vector γ s ) f ) 594

the nner product of test mages f wth the tranng set mages y. For the BLM to wor properly for recognton, t s requred that the correspondng projecton coeffcents ( for j =, ) should be approxmately the same. hs requres that the projecton vectors γ s should be approxmately the same for j =, (Eqn 5) whch demands that the projecton coeffcents for every tranng mage par should be the same across modaltes. By usng SVD, they capture the varaton n the mages, whle ther BLM ensures that mages of the same content and dfferent styles wll project to the same coordnates n ths bass. However, the BLM may not hold when the correspondng mages are not well correlated. In such cases, t may create a representaton that captures varaton n the data, at the expense of capturng the features that account for the correlaton between mages n dfferent styles, as show n Fgure. In ths toy problem, the x-coordnates of correspondng ponts n X and Y are the same and the y- coordnates are uncorrelated. Projecton to the x-axs maes the data perfectly correlated but removes much of the varance. BL x/y and PLS x/y corresponds to the projecton drectons found usng BLM and PLS on two dfferent sets of correlated ponts X and Y. Note that PLS stll fnds drectons whch maes the projectons correlated whle the BLM manly represents varance n Y and consequently fals to obtan the optmal X drecton too. 0 8 6 4 0 - -4-6 X Y BL X BL Y -8 PLS X PLS Y -0-3 - - 0 3 Fgure : Comparson of PLS and BLM [4] for the case when the data X and Y are not correlated (see text for detals). Note the dfferent scales on x and y axes. hs problem motvates our use of PLS for cross-modal recognton. 3. Partal Least Square Partal Least Square analyss [6, 4 5] s a regresson model that dffers from Ordnary Least Square regresson by frst projectng the regressors (nput) and responses (output) onto a low dmensonal latent lnear subspace. PLS chooses these lnear projectons such that the covarance between latent scores of regressors and responses s maxmzed and then t fnds a lnear mappng from the regressors latent score to response s latent score. We apply PLS by usng mages from one modalty as regressors and usng correspondng mages from a dfferent modalty as responses. In ths way, we learn a lnear projecton for each modalty that maps mages nto a common space n whch they can be compared. Partal Least Square has been prevously used for face recognton [7, 9, 0,, ]. We have been partcularly motvated by the approach of [], whch acheves excellent expermental results. However, these results use PLS n a qute dfferent way than we do. hey used PLS to fnd a regresson functon from mage feature space to a bnary label space for performng one-vs-all classfcaton. In [7, 9, 0, ] PLS has been used to extract feature vectors n accordance wth the label nformaton. In ths regard, t s very smlar to Lnear Dscrmnate Analyss (LDA), wth the consderable advantage that gven two classes, t can select an arbtrary number of lnear features, rather than choosng a sngle lnear projecton. In contrast, we smply use pxel ntenstes as our features, and focus on PLS s ablty to map mages from dfferent modaltes nto a common space. here are several varants of PLS analyss based on the factor model assumpton and the teratve algorthm used to learn the latent space [6, 4]. Some of these varants facltate the ntuton behnd PLS whle some are faster than others but the objectve functon for all of them s the same. In ths paper, we have used the orgnal NIPALS algorthm [] to develop ntutons and a varant of NIPALS gven n [5] to learn the latent space. 3.. Descrpton of PLS Let us suppose that we have n observatons (nput space) and each of them s a p dmensonal vector. In correspondence we have n observatons lyng n a q dmensonal space as our output. Let X be the regressor matrx and Y be the response matrx where each row contans one observaton so X and Y are (n p) and (n q) matrces respectvely. PLS models X and Y such that: X P E (6-a) Y UQ F (6-b) U D H (6-c) and U are (n d) matrces of the d extracted PLS scores or latent projectons. he (p d) matrx P and the (q d) matrx Q represent matrces of loadngs and the (n p) matrx E, (n q) matrx F and n d matrx H are the resdual matrces. D s a (d d) dagonal matrx whch relates the latent scores of X and Y. PLS wors n a greedy way and fnds a D projecton of X and Y at each 595

teraton. hat s, t fnds normalzed bass vectors w and c such that the covarance between the score vectors t and u (rows of and U) s maxmzed: max([ cov ( t, u)] ) max([ cov ( Xw, Yc)] ) (7) s. t. w c PLS terates ths process wth a greedy algorthm to fnd multple bass vectors that project X and Y to a hgher dmensonal space. It s nterestng to compare ths to the objectve functon of Canoncal Correlaton Analyss (CCA) to emphasze the dfference between PLS and CCA. CCA tres to maxmze the correlaton between the latent scores max ([ corr ( Xw, Yc)] ) (8) where, cov( a, b) corr( a, b). (9) var ( a). var ( b) puttng the expresson from (9) nto (7) we get the PLS objectve functon as: max ([ var ( Xw)].[ corr ( Xw, Yc)].[ var ( Yc)]) (0) s. t. w c It s clear from (0) that PLS tres to correlate the latent score of regressor and response as well as captures the varatons present n the regressor and response space too. CCA only correlates the latent score hence CCA fals to generalze well to unseen testng ponts and even fals to dfferentate between tranng samples n the latent space under some specal condtons. BLM on the other hand as shown n the fgure attempts to capture varaton n both spaces. One toy condton where PLS wll succeed and both BLM and CCA wll fal to obtan meanngful drectons can be stated as follows - Suppose we have two sets of 3D ponts X and Y and x and j y denote the j th element of the th data pont n X and Y Suppose that the frst coordnates of all x and y are equal to a constant.e., x y Var ( X j ) Var ( Y ) 0 he second coordnates are correlated wth a coeffcent ρ whch s less than and the varance present n the second coordnate s ψ.e. corr ( X, Y ) & Var ( X ), Var ( Y ) he thrd coordnate s almost uncorrelated and the varance s >> ψ.e. corr ( X 3, Y 3 3 3 ) 0 & Var ( X ), Var ( Y ) Under ths stuaton CCA wll gve the frst coordnate as the prncpal drecton whch projects all the data ponts n sets X and Y to a common sngle pont n the latent space, renderng recognton mpossble. BLM wll fnd a drecton whch s parallel to the thrd coordnate whch preserves the nter-set varance but loses all the 596 correspondence. PLS however, wll opt for second coordnate whch preserves varance (dscrmnaton) as well as mantans correspondence whch s crucal for our tas of mult-modal recognton. PLS therefore stres a balance between the objectves of Prncpal Component Analyss (PCA) and CCA. It should be noted that the dmenson of regressor and response score vectors s the same and s equal to the number of extracted PLS bases. Hence, the latent representaton of both regressor and response les n the same vector space. Moreover, snce PLS bases are such that the latent scores are hghly correlated t can be safely assumed that regressor and response latent scores are roughly embedded n a sngle lnear manfold, thus a smple Nearest Neghbor metrc wll suffce for recognton. 3.. Learnng PLS bases Consder the regressor and response data matrces X and Y (both column centered) defned n secton.. We defne the regresson model as: Y XB E (XW)Z E Z E () he detaled step by step algorthm to obtan these varables s gven n [5]. he MALAB code to obtan W and Z can be found here http://www.cs.umd.edu/ ~djacobs/pubs_fles/pls_bases.m. Here, B s the (p q) regresson matrx from X to Y, W s the (p d) projecton matrx from X to the latent space, s the latent score matrx of X and Z s a (q d) matrx representng the lnear transformaton from the d dmensonal latent space to Y. So essentally we can project Y nto the latent space and calculate ts latent score U as: YZ(Z U Z) () Please note that the matrces, U and W are not the same matrces as n secton. but can be scaled and columns of W and Z (Eqn ) are equvalent to w and c (Eqn 7). 4. When can PLS wor? We wll use PLS to fnd lnear projectons w and c that map mages taen n two modes nto a common subspace. Equaton (0) shows that PLS wll see w and c that tend to produce hgh levels of correlaton n the projecton of correspondng mages from dfferent modaltes. However, PLS cannot be expected to lead to effectve recognton when such projectons do not exst. In ths secton, we show some condtons n whch projectons of mages from two modaltes exst n whch the projected mages are perfectly correlated (and n fact equal). hen we show that these condtons hold for some nterestng examples of cross-modalty recognton. We should note that the exstence of such projectons s

not suffcent to guarantee good recognton performance. We wll assess the actual performance of PLS emprcally, n the next secton. 4.. Exstence of correlated projectons In a number of cases, mages taen n two dfferent modes can be vewed as dfferent, lnear transformatons of a sngle deal object. Let I and J denote column vectors contanng the pxels of correspondng mages, taen n two modaltes. We denote by R a matrx (or column vector) that contans an dealzed verson of I and J, such that we can wrte: I = A R J = B R (3) for some matrces A and B. We would le to now when t wll be possble to fnd vectors w and c that project sets of mages nto a D space n whch they are hghly correlated. We consder a smpler case, loong at when the projectons can be made equal. hat s, when we can fnd w and c such that for any I and J satsfyng Equaton (3) we have: w I c J w AR c BR (4-a) w A c B (4-b) Equaton (4-a,b) can be satsfed f and only f the row spaces of A and B ntersect, as the LHS of the Eqn (4-b) s a lnear combnaton of the rows of A, whle the RHS s a lnear combnaton of the rows of B. We now gve some examples n whch ths condton holds. 4.. Hgh resoluton vs. low resoluton For ths stuaton, we can assume that the deal mage s just the hgh resoluton mage, so that A s smply the dentty matrx, and I = R. J then, can be obtaned by smoothng R wth a Gaussan flter, and subsamplng the result. Both operatons can be represented n matrx form. Any convoluton can be represented as a matrx multplcaton. For ths, the th row of B contans a vectorzed Gaussan flter centered at the mage locaton of the th pxel n R. B can subsample the result of ths convoluton by smply omttng rows correspondng to pxels that are not sampled. Now because A s the dentty matrx, t has full ran, and ts row space must ntersect that of B. 4.3. Pose varaton We now consder the more challengng problem that arses when comparng two mages taen of the same 3D scene from dfferent vewponts. hs rases problems of fndng a correspondence between pxels n the two mages, as well as accountng for occluson. o wor our way up to ths problem, we frst consder the case n whch there exsts a one-to-one correspondence between pxels n the mage, wth no occluson. Permutatons: In ths case, we can agan suppose that A s the dentty matrx. In ths case, B wll be a permutaton matrx, whch changes the locaton of pxels wthout alterng ther ntenstes. In ths case, A and B are both of full ran, and n fact have a common row space. So agan, there exst w and c that wll project I and J nto a space where they are equal. Stereo: We now consder a more general problem that s commonly solved by stereo matchng. Suppose we represent a 3D object wth a trangular mesh. Let R contan the ntenstes on all faces of the mesh that appear n ether mage (We wll assume that each pxel contans the ntensty from a sngle trangle. More realstc renderng models could be handled wth slghtly more complcated reasonng). hen, to generate mages approprately, A and B wll be matrces n whch each row contans one and s 0 otherwse. A (or B) may contan dentcal rows, f the same trangle projects to multple pxels. he ran of A wll be equal to the number of trangles that create ntenstes n I, and smlarly for B. he number of columns n both matrces wll equal the number of trangles that appear n ether mage. So ther row spaces wll ntersect, provded that the sum of ther rans s greater than or equal to the length of R, whch occurs whenever the mages contan projectons of any common pxels. As a toy example, we consder a small D stereo par showng a dot n front of a planar bacground. We mght have I =[7 8 5] and J = [7 3 5]. In ths example we mght have R = [7 8 3 5] and: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 It can be nferred from the example that row spaces of A and B ntersect hence we expect PLS to wor. 4.4. Comparng mages to setches Fnally, we note that our condtons may approxmately hold n the relatonshp between mages and setches. hs s because setches often capture the edges, or hgh frequency components of an mage. A flter such as a Laplacan of a Gaussan produces an output that s smlar to a setch (eg., Fgure ). Agan, the deal mage can be the same as the ntensty mage, whle the setch mage can be produced by a B that represents ths convoluton, satsfyng our condtons. 597

5. PLS based mult-modal recognton Gven a problem of mult-modal recognton such as Dfferent pose face mages, Setch and Photo etc. we can learn the PLS bases on a tranng set usng the teratve algorthm gven n [5]. hen usng equatons () and () we can project a par of mages of the same subject seen under two dfferent modaltes to the latent space to generate a par of latent scores. Once the latent space scores are obtaned we can do smple NN recognton. For practcal purposes, we wll smply calculate and store the latent projectons of gallery mages and compute the latent projecton of the probe mage onlne. In the next few sectons we present our results on face recognton across poses, setch-photo pars and Low and Hgh Resoluton pars. 5.. Pose nvarant face recognton o demonstrate the effcacy of the proposed algorthm we have used t for face recognton across poses. We have used the CMU PIE face database [6] to evaluate and compare the performance of our method wth other approaches. hs database conssts of 3 poses wth large pose varaton. In the past, many researchers have used ths dataset to evaluate ther algorthms. he dataset s dvded nto tranng (subject to 34) and testng (subject 35 to 68) subsets. PLS bases correspondng to each of the dfferent pose pars are learned usng the tranng set and recognton performance s evaluated on the testng set. For all the pose pars we have used 30 PLS bases for our proposed method and 5 egenvectors for the Blnear Model; these values produce the best results. Snce, there are 3 poses there are 6 galley and probe pars. able reports the accuracy for all of these cases. For the purpose of comparson wth other methods we have adopted two dfferent protocols. Some methods have reported the accuracy n the form of able (34 face gallery wth all possble pose pars); for those the comparson has been done n able. For others, comparson s done n able 3, n accordance wth ther protocol. We are ctng the results of performance by other methods drectly from the papers except BLM and CCA for whch we have done all the experments. It should be noted that unle [4] we have used CCA wth smple pxel ntenstes wthout probablstc modelng.e as wth PLS to compare the strength of PLS and CCA under equal condtons. It s clear from the comparson that the proposed method s a sgnfcant mprovement over pror methods. Note that on the two pose pars reported n [], we perform somewhat less well than ther method. However, t s notable that ther method requres 4 hand-clced ponts by a human operator. hey then compare responses of Gabor flters n the area of these ponts. Our method requres algnment of face thumbnals usng the eyes and mouth. Moreover, when they have used smple ntensty as the feature ther accuracy dropped sgnfcantly. able Comparson of proposed method wth others on 34 face gallery on CMU PIE dataset. Methods Accuracy me per comparson Egenfaces [0] 6.6 < 0.005 seconds FaceIt [0] 4.3 > 5 mnutes ELF [0] 66.3 > 5 mnutes Blnear Model [4] 79.6 < 0.005 seconds 4ptSMD [] 86.8 0.35 Seconds CCA 87.35 <0.005 Seconds Proposed 90. 0.0046 seconds able 3 Results under dfferent settngs as per as the results reported by dfferent authors. Method Gallery Probe Accuracy /proposed PGFR [5] c7 c05/37/5//9//4/34 86/93.4 FA [] c7 c05/ 95/90 LLR [3] c7 c05/9/37//07/09 94.6/00 ELF [0] c7 c05/9/37//07/09 89.8/00 In addton, some authors that do not use a tranng set have reported results usng a gallery of 68 ndvduals. In partcular [8] has reported strong results n ths settng. Whle we cannot compare drectly to ther results, we note that [] reports results for galleres of 68 and 34 faces. Wth a gallery of 68 faces results n [] are consderably better than those of [8] (8.4% vs. 74.3%) and wth a gallery of 34 faces, our results are substantally better than those of [] (90.% vs. 86.8%). We note that our approach does requre pror nowledge of the pose of the probe mage, and a tranng set that contans example faces taen n a smlar pose. A smlar assumpton s made n the ELF [0] algorthm. [8] maes use of hand-clced ponts and a morphable model to compute face pose, whle [] uses hand-clced ponts to compute the eppolar geometry relatng the two mages. Research and commercal systems have shown mpressve performance n automatcally computng pose. Some prelmnary experments on proposed method showed that recognton performance does not decrease drastcally wth slght change n pose between PLS bases and gallery/probe faces. Explorng ths aspect thoroughly wll be our future effort for evaluaton usng automatc pose dentfers. 5.. Low resoluton face recognton hs problem s yet another mult-modal problem because probe mages from a survellance camera are generally low resoluton (LR) wth slght moton blur and nose. he gallery generally contans hgh resoluton (HR) faces. o verfy the applcablty of our method we have 598

Recognton Accuracy able : Accuracy for all the possble pose-pars on CMU PIE dataset usng proposed method overall accuracy for all pose pars s 90.% Probe Gallery c34 c3 c4 c c9 c09 c7 c07 c05 c37 c5 c0 c Avg c34 -- 0.88 0.94 0.94 0.9 0.88 0.9 0.97 0.85 0.88 0.70 0.85 0.6 0.86 c3 0.85 -- 0.88 0.85 0.9 0.85 0.88 0.76 0.85 0.76 0.884 c4 0.97 -- 0.97 0.9 0.97 0.9 0.8 0.9 0.67 0.98 c 0.79 0.97 -- 0.88 0.97 0.97 0.85 0.88 0.67 0.96 c9 0.76 0.94 -- 0.85 0.9 0.73 0.933 c09 0.76 0.88 0.9 0.94 0.94 -- 0.97 0.94 0.9 0.88 0.8 0.79 0.70 0.87 c7 0.85 0.9 0.97 -- 0.85 0.88 0.79 0.939 c07 0.79 0.9 0.97 0.97 -- 0.97 0.85 0.9 0.76 0.99 c05 0.79 0.97 0.97 0.94 0.94 -- 0.97 0.9 0.9 0.8 0.936 c37 0.79 0.94 0.94 0.94 0.88 0.94 0.94 0.97 -- 0.94 0.94 c5 0.67 0.8 0.76 0.79 0.88 0.88 0.88 0.9 0.94 0.97 -- 0.97 0.76 0.855 c0 0.76 0.88 0.88 0.94 0.94 0.88 0.97 0.94 -- 0.97 0.93 c 0.64 0.70 0.64 0.79 0.76 0.67 0.8 0.8 0.85 0.9 0.85 0.9 -- 0.784 synthetcally generated low resoluton mages for frontal face mages n a subset of FERE face dataset and performed recognton. he orgnal HR mages were chosen to be 76 66 and dfferent sze LR mages were tested for recognton. Fg. 3 shows the recognton accuracy of the proposed method. Note that a drect comparson of HR and LR face mages wth as low a resoluton as 5 4 resulted n 60% recognton accuracy. Moreover, the number of PLS bases used n any case for optmal performance are not greater than 0 and for some cases just 3 PLS bases gave 95% accuracy. We have used 90 faces for tranng and 00 for testng. Due to lac of space we have not shown the results for BLM but t should be noted that t performed smlarly. However, performance of CCA was very poor rangng between 30-50% only. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 38 by 33 9 by 6 4 by 7 by 6 5 by 4 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 PLS Bases Used Fgure 3: Accuracy for Low Resoluton face recognton vs. the number of PLS bases used wth dfferent sze LR mages used 5.3. Setch-Photo recognton o demonstrate the generalty of our proposed approach we have also tested t on a setch vs. photo recognton problem. o test the performance of our method we have used a subset of the CUHK setch face dataset [3]. We used a subset contanng 88 subjects face mages and correspondng hand drawn setch pars. 88 setch-photo pars were used as the tranng sample and the remanng 00 were used as the testng set. We formed 5 random parttons of the dataset to generate dfferent sets of tranng and testng data and report the average accuracy. In ths case, we have used 70 PLS bases and 50 egenvectors for the Blnear Model. A comparson of our method wth other reported results s shown n able 4. From the comparson t s clear that n spte of beng holstc n nature, the proposed method acheves respectable accuracy. We feel that ths s encouragng because our method s completely general; we have used exactly the same algorthm for pose, LR face recognton and setch. he table also reflects the trend that accuracy s ncreasng contnuously as we move down from holstc to pxel level representaton. So t may be possble that usng patch-wse features wth our method wll mprove the accuracy. It should be noted that n [5] and [4] the authors have used strong classfers after extractng patchwse and pxel based features, whereas we have smply used the NN metrc after latent score extracton. able 4 Setch Photo par recognton accuracy Method estng set ype Accuracy Wang [] 00 Holstc 8 Lu [5] 300 Patch-wse 87.67 Klare [4] 300 Pxel-wse 99.47 Proposed 00 Holstc 93.6 Blnear 00 Holstc 94. CCA 00 Holstc 94.6 6. Conclusons We have demonstrated a general latent space framewor for cross-modal recognton and the relevance of PLS to cross-modal face recognton. heoretcally, we have shown that n prncple, there exst lnear projectons of mages taen n two modaltes that map them to a space n whch mages of the same ndvdual are equal. hs s 599

true for mages taen n dfferent poses, at dfferent resolutons, and approxmately, for setches and ntensty mages. Expermentally, we show that PLS and BLM can be used to acheve strong face recognton performance n these domans. Of partcular note, we show that PLS has outperformed the best reported performance on the problem of face recognton wth pose varaton wth mpressve margn both n terms of accuracy as well as run-tme and that Blnear Models n all three domans outperformed many exstng approaches. Moreover, usng the exact same method we have also acheved comparable performance for setch-photo and cross resoluton face recognton. Acnowledgements hs research was funded by the Offce of the Drector of Natonal Intellgence (ODNI), Intellgence Advanced Research Projects Actvty (IARPA), through the Army Research Laboratory (ARL). All statements of fact, opnon or conclusons contaned heren are those of the authors and should not be construed as representng the offcal vews or polces of IARPA, the ODNI, or the U.S. Government. Authors would also le to thans Raghuraman for helpng wth the cropped mages of CMU PIE data. Frst author s grateful to Chrstne and Marcello for helpng out wth A stuff gvng hm ample of tme for research. References [] X. ang and X. Wang Face Setch Recognton, IEEE ransactons on Crcuts Systems for Vdeo echnology, 4(), 50-57, 004. [] B. Xao, X. Gao, D. ao, Y Yuan and J. L, Photo-setch synthess and recognton based on subspace learnng, Neurocomputng 73, 840-85, 00. [3] X. Wang and X. ang, Face photo-setch synthess and recognton, IEEE ransactons Pattern Analyss Machne Intellgence, 3(), 955-967, 009. [4] B. Klare, Z. L and A. K. Jan, Matchng forensc setches to mugshot photos, IEEE Pattern Analyss and Machne Intellgence, 9 Sept. 00. [5] Q. Lu, X. ang, H. Jn, H Lu, S. Ma, Nonlnear Approach for Face Setch Synthess and Recognton, IEEE CVPR, 005-00, 005. [6] C. Lu, H. Y. Shum and W.. Freeman, Face hallucnaton: theory and practce, IJCV 75(), 5 34, 007. [7] J. Yang, H. ang, Y. Ma and. Huang, Face hallucnaton va sparse codng, IEEE Int. Conf Image Processng 08, 64-67, 008. [8] B. L, H. Chang, S. Shan and X. Chen, Algnng coupled manfolds for face hallucnaton, IEEE Sgnal Processng Letters, 6(), 957-960, 009. [9] Y. Zhuang, J. Zhang, and F. Wu, Hallucnatng faces: LPH super-resoluton and neghbor reconstructon for resdue compensaton, Pattern Recognton, 40, 378-394, 007. [0] R. Gross, I. Matthews, S. Baer, Appearance-based face recognton and lght - felds, IEEE rans. Pattern Anal. Mach. Intell. 6 (4), 449 465, 004. [] C.D. Castllo, D.W. Jacobs, Usng stereo matchng wth general eppolar geometry for d face recognton across pose, Pattern Analyss and Machne Intellgence, 3(), 98-304, 009. [] S.J.D. Prnce, J.H. Elder, J. Warrell, F.M. Felsbert, ed Factor Analyss for Face Recognton across Large Pose Dfferences, IEEE Patt. Anal. Mach. Intell, 30(6), 970-984, 008. [3] X. Cha, S. Shan, X. Chen and W. Gao, Locally lnear regresson for pose nvarant face recognton, IEEE ran. Image Processng, 6(7), 76-75, 007. [4] J. B. enenbaum, W.. Freeman, Separatng style and content wth blnear models. Neural Computaton (6), 47-83, 000. [5] X. Lu,. Chen, Pose-robust face recognton usng geometry asssted probablstc modelng, IEEE CVPR, vol., 005, pp. 50 509. [6] R. Rospal & N. Kräämer, Overvew and recent advances n partal least squares, In Subspace, latent structure and feature selecton technques, Lecture Notes n Computer Scence, Sprnger, 34-5, 006. [7] C. Dhanjal, S. R. Gunn and J. S. aylor, Effcent sparse ernel feature extracton based on partal least squares, IEEE Patt. Anal. Mach. Intell. 3(8), 947-96, 009. [8] S. Romdhan, V. Blanz, and. Vetter, Face Identfcaton by Fttng a 3d Morphable Model Usng Lnear Shape and exture Error Functons, Proc. ECCV, 4, 3-9, 00. [9] J. Baea and M. Kmb, Face recognton usng partal least squares components, Pattern Recognton, 37, 303-306, 004. [0] V. Struc, N. Pavesc, Gabor-based ernel partal-leastsquares dscrmnaton features for face recognton, Informatca, 0(), 009. [] X. L, j Ma and S. La, Novel face recognton method based on a prncpal component analyss and ernel partal least square, IEEE ROBIO 007, 773-777. [] W.R. Schwartz, H. Guo, L.S. Davs. A Robust and Scalable Approach to Face Identfcaton. ECCV 00. [3] S. Romdhan,. Vetter, D. J. Kregman, Face recognton usng 3-D models: pose and llumnaton, proc. of IEEE, 94(), 977 999, 006. [4] A. L, S. Shan, X. Chen, W Gao, Maxmzng Intrandvdual Correlatons for Face Recognton Across Pose Dfferences, IEEE CVPR, 009, pp. 605-6. [5] Partal Least Square utoral, http://www.statsoft.com/textboo/partal-leastsquares/#simpls. [6]. Sm, S. Baer, and M. Bsat, he CMU Pose, Illumnaton, and Expresson Database, IEEE rans. Patt. Anal. Machne Intellgence, 5(), 65-68, 003. 600