Pattern Recognton Letters 24 (2003) 2159 2165 www.elsever.com/locate/patrec Usng mxture covarance matrces to mprove face and facal expresson recogntons Carlos E. Thomaz a, *, Duncan F. Glles a, Raul Q. Fetosa b a Department of Computng, Imperal College London, 180 Queen s Gate, London SW7 2BZ, UK b Department of Electrcal Engneerng, Catholc Unversty of Ro de Janero, r. Marques de Sao Vcente 225, Ro de Janero 22453-900, Brazl Abstract In several pattern recognton problems, partcularly n mage recognton ones, there are often a large number of features avalable, but the number of tranng examples for each pattern s sgnfcantly less than the dmenson of the feature space. Ths statement mples that the sample group covarance matrces often used n the Gaussan maxmum probablty classfer are sngular. A common soluton to ths problem s to assume that all groups have equal covarance matrces and to use as ther estmates the pooled covarance matrx calculated from the whole tranng set. Ths paper uses an alternatve estmate for the sample group covarance matrces, here called the mxture covarance, gven by an approprate lnear combnaton of the sample group and pooled covarance matrces. Experments were carred out to evaluate the performance of ths method n two bometrc classfcaton applcatons: face and facal expresson. The average recognton rates obtaned by usng the mxture covarance matrces were hgher than the usual estmates. Ó 2003 Elsever Scence B.V. All rghts reserved. Keywords: Face recognton; Facal expresson recognton; Gaussan maxmum probablty classfer; Mxture covarance matrx 1. Introducton A crtcal ssue for the Gaussan maxmum probablty classfer s the nverse of the sample group covarance matrces. Snce n practce these matrces are not known, estmates must be computed based on the observatons (pattern examples) avalable n a tranng set. In some applcatons, however, there are often a large * Correspondng author. E-mal addresses: cet@doc.c.ac.uk (C.E. Thomaz), dfg@ doc.c.ac.uk (D.F. Glles), raul@ele.puc-ro.br (R.Q. Fetosa). number of features avalable, but the number of tranng examples for each group s lmted and sgnfcantly less than the dmenson of the feature space. Ths mples that the sample group covarance matrces wll be sngular. Ths problem, whch s called a small sample sze problem (Fukunaga, 1990), s qute common n pattern recognton, partcularly n mage recognton where the number of features s very large. One way to overcome ths problem s to assume that all groups have equal covarance matrces and to use as ther estmates the weghtng average of each sample group covarance matrx, 0167-8655/03/$ - see front matter Ó 2003 Elsever Scence B.V. All rghts reserved. do:10.1016/s0167-8655(03)00085-0
2160 C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 gven by the pooled covarance matrx calculated from the whole tranng set. The am of ths work s to nvestgate another estmate for the sample group covarance matrces, here called mxture covarance matrces, gven by a lnear combnaton of the sample group covarance matrx and the pooled covarance matrx. The mxture covarance matrces are based on the Hoffbeck and Landgrebe (1996) approach and have the property of havng the same rank as the pooled estmate, whle allowng a dfferent estmate for each group. Thus, the mxture estmate may result n hgher accuracy. In order to evaluate ths approach, two bometrc applcatons were consdered: face recognton and facal expresson recognton. The evaluaton used dfferent mage databases for each applcaton. A probablstc model was used to combne the well-known dmensonalty reducton technque called prncpal component analyss (PCA) and the Gaussan maxmum probablty classfer, and n ths way we could nvestgate the performance of the mxture covarance matrces on the recognton tasks referred to above. Experments carred out show that the mxture covarance estmates attaned the best performance n both applcatons. 2. Dmensonalty reducton In bometrc applcatons, the number of tranng samples s lmted and usually sgnfcantly less than the number of pxels of each mage. Thus the hgh-dmensonal space s very sparsely represented; makng the parameter estmaton qute dffcult a problem that s called the curse of dmensonalty (e.g., Bshop, 1997). One of the most successful approaches to the problem of creatng a low dmensonal mage representaton s based on PCA. PCA generates a set of orthonormal bass vectors, known as prncpal components, whch mnmzes the mean square reconstructon error and descrbes major varatons n the whole tranng set consdered. Instead of analysng the maxmum probablty classfer drectly on the face or facal expresson mages, PCA s appled frst to provde dmensonalty reducton. Many researchers have confrmed that the PCA representaton has good generalzaton ablty especally when the dstrbutons of each class are separated by the mean dfference (Krby and Srovch, 1990; Turk and Pentland, 1991; Zhao et al., 1998; Lu and Wechsler, 2000). However, even after reducton the feature space s stll often of hgher dmenson than the number of tranng samples. 3. Maxmum probablty classfer The basc problem n the decson theoretc methods for pattern recognton conssts of fndng a set of g dscrmnantfunctonsd 1 ðxþ; d 2 ðxþ;...; d g ðxþ, where g s the number of groups or classes, wth the decson rule such that f the p-dmensonal pattern vector x belongs to the class (1 6 6 g), then d ðxþ P d j ðxþ, for all 6¼ j and 1 6 j 6 g. The Bayes classfer desgned to maxmze the total probablty of correct classfcaton, where equal pror probabltes for all groups are assumed, corresponds to a set of dscrmnant functons equal to the correspondng probablty densty functons, that s, d ðxþ ¼f ðxþ for all classes (Johnson and Wchern, 1998). The most common probablty densty functon appled to pattern recognton systems s based on the Gaussan multvarate dstrbuton d ðxþ¼f ðxjl ;R Þ 1 ¼ exp 1 ð2pþ p=2 1=2 jr j 2 ðx l Þ T R 1 ðx l Þ ; ð1þ where l and R are the class populaton mean vector and covarance matrx respectvely. The notaton jj denotes the determnant of a matrx. In practce, however, the true values of the mean and the covarance matrx are seldom known and must be estmated from tranng samples. The mean s estmated by the usual sample mean l x ¼ 1 k X k x ;j ; ð2þ where x ;j s observaton j from class, and k s the number of tranng observatons from class. The
C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 2161 covarance matrx s commonly estmated by the sample group covarance matrx defned as R S ¼ 1 X k ðx ;j x Þðx ;j x Þ T : ðk 1Þ ð3þ If we replace the true values of the mean and the covarance matrx n Eq. (1) by ther respectve estmates, the Bayes decson rule acheves optmal classfcaton accuracy only when the number of tranng samples ncreases toward nfnty (e.g., Hoffbeck and Landgrebe, 1996). In fact for p-dmensonal patterns the sample covarance matrx s sngular f less than p þ 1 ndependent tranng examples from each class are avalable, that s, the sample covarance matrx cannot be calculated f k s less than the dmenson of the feature space. One method routnely appled to solve ths problem s to assume that all classes have equal covarance matrces, and to use as ther estmates the pooled covarance matrx. Ths covarance matrx s a weghtng average of each sample group covarance matrx and, assumng that all classes have the same number of tranng observatons, s gven by S pooled ¼ 1 g X g ¼1 S : ð4þ Snce more observatons are taken to calculate the pooled covarance matrx S pooled, ths one wll potentally have a hgher rank than S and wll be eventually full rank. Although the pooled estmate does provde a soluton for the algebrac problem arsng from the nsuffcent number of tranng observatons n each group, S pooled s theoretcally a consstent estmator of the true covarance matrces only when R 1 ¼ R 2 ¼¼R g. 4.1. Defnton The mxture covarance matrx s a lnear combnaton between the pooled covarance matrx S pooled and the sample covarance matrx of one class S. It s gven by S mx ðw Þ¼w S pooled þð1 w ÞS ; ð5þ where the mxture parameter w takes on values 0 < w 6 1 and s dfferent for each class. Ths parameter controls the degree of shrnkage of the sample group covarance matrx estmates toward the pooled one. Fg. 1 llustrates the geometrc dea of the mxture covarance matrx on a two-dmensonal feature space contanng three hypothetcal classes. The constant probablty denstes contours of S and S pooled are represented by the dashed and dotted gray ellpses respectvely. The mxture covarance estmates assume that the ellpses correspondng to the true covarance matrces are placed somewhere n between S and S pooled contours, as shown by the sold black ellpses. Each S mx matrx has the mportant property of admttng an nverse f the pooled estmate S pooled does so (Magnus and Neudecker, 1999, pp. 21 22). Ths mples that f the pooled estmate s nonsngular and the mxture parameter takes on values w > 0, then the S mx wll be non-sngular. Therefore the remanng queston s: what s the value of the w that gves a relevant lnear mxture between the pooled and sample covarance x 2 4. Mxture covarance matrx The choce between the sample group covarance matrx and the pooled covarance one represents a lmted set of estmates for the true covarance matrx. A more flexble set can be obtaned usng the mxture covarance matrx. Fg. 1. Geometrc dea of the mxture covarance matrx. x 1
2162 C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 estmates? A method that determnes an approprate value of the mxture parameter s descrbed n the next secton. 4.2. The mxture parameter Accordng to Hoffbeck and Landgrebe (1996), the value of the mxture parameter w can be approprately selected so that a best ft to the tranng samples s acheved. Ther technque s based on the leave-one-out-lkelhood (L) parameter estmaton (Fukunaga, 1990). In the L method, one observaton of the class tranng set s removed and the mean and covarance matrx from the remanng k 1 examples s estmated. After that the lkelhood of the excluded sample s calculated gven the prevous mean and covarance matrx estmates. Ths operaton s repeated a further k 1 tmes and the average log lkelhood s computed over all the k observatons. The strategy s to evaluate several dfferent values of w n the range 0 < w 6 1, and then choose w that maxmzes the average log lkelhood. The mean of class wthout observaton r may be computed as x nr ¼ 1 ðk 1Þ " X k! x ;j x ;r #: ð6þ " L ðw Þ¼ 1 X k h # ln f x ;r jx nr ; S mx nr k ðw Þ ; ð9þ r¼1 where f ðx ;r jx nr ; Snr mxðw ÞÞ s the Gaussan probablty functon defned n Eq. (1) wth x nr mean vector and Snr mxðw Þ covarance matrx defned as S mx nr ðw Þ¼w S poolednr þð1 w ÞS nr : ð10þ As Hoffbeck and Landgrebe (1996) ponted out, ths approach, f mplemented n a straghtforward way, would requre computng the nverse and determnant of the Snr mxðw Þ for each tranng sample. Snce the Snr mxðw Þ s a p by p matrx and p s typcally a large number, ths computaton would be qute expensve. However, they showed that t s possble to sgnfcantly reduce the requred computaton by usng the Sherman Morrson Woodbury formula (Golub and Van Loan, 1989, p. 51) gven by ða þ uu T Þ 1 ¼ A 1 A 1 uu T A 1 1 þ u T A 1 u : ð11þ where A s a n by n matrx and u s a n by 1 vector. Ths allowed them to wrte the log lkelhood of the excluded samples n an analogous form as follows: h ln f x ;r jx nr ; S mx nr ðw Þ The notaton n r conforms to the Hoffbeck and Landgrebe (1996) work. It ndcates that the correspondng quantty s calculated wth the rth observaton from class removed. Followng the same dea, the sample covarance matrx and the pooled covarance matrx of class wthout observaton r are S nr ¼ S poolednr 1 ðk 2Þ " X k ðx ;j x nr Þðx ;j x nr Þ T! ðx ;r x nr Þðx ;r x nr Þ T #; ð7þ ¼ 1 g " X g! S j S þ S nr #: ð8þ Thus the average log lkelhood of the excluded observatons can be calculated as follows: ¼ p 2 lnð2pþ 1 ln jqjð1 2 ½ vdþš 1 2 k d 2 k 1 1 vd where Q ¼ ð1 w Þ ðk 1Þ ðk 2Þ þ w 1 gðk 2Þ ; ð12þ S þ w S pooled ; ð13þ k v ¼ ðk 1Þðk 2Þ 1 ðg 1Þ w ; ð14þ g d ¼ðx ;r x Þ T Q 1 ðx ;r x Þ: ð15þ Fnally, when the parameter w s selected, the mxture covarance matrx estmate defned n Eq. (5) s calculated usng all the tranng examples and replaced nto the maxmum probablty classfer.
C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 2163 5. Experments Two bometrc experments wth two dfferent databases were performed. In the face recognton experment the olvett face database (ORL) contanng ten mages for each of 40 ndvduals, a total of 400 mages, were used. The Tohoku Unversty has provded the database for the facal expresson experment. Ths database s composed of 193 mages of expressons posed by nne Japanese females (Lyons et al., 1999). Each person posed three or four examples of each sx fundamental facal expresson: anger, dsgust, fear, happness, sadness and surprse. The database has at least 29 mages for each fundamental facal expresson. For mplementaton convenence all mages were frst reszed to 64 64 pxels. The experments were carred out as follows. Frst PCA reduces the dmensonalty of the orgnal mages and secondly the Gaussan maxmum probablty classfer usng one out of the three covarance estmates S (or Sgroup), S pooled (or Spooled) and S mx (or Smx) was appled. Each experment was repeated 25 tmes usng several PCA dmensons. Dstnct tranng and test sets were randomly drawn, and the mean and standard devaton of the recognton rate were calculated. The face recognton classfcaton was computed usng for each ndvdual fve mages to tran and fve mages to test. In the facal expresson recognton, the tranng and test sets were respectvely composed of 20 and 9 mages. The sze of the mxture parameter (0 < w 6 1) optmsaton range was taken to be 20, that s w ¼½0:05; 0:10; 0:15;...; 1Š. 6. Results Tables 1 and 2 present the tranng and test average recognton rates (wth standard devatons) of the face and facal expresson databases, respectvely, over the dfferent PCA dmensons. Also the optmsed mxture parameters w over the common PCA components of both applcatons are shown n Table 3. Snce only fve mages of each ndvdual were used to form the face recognton tranng set, the results relatve to the sample group covarance estmate were lmted to four PCA components. Table 1 shows that n all but one experment the S mx (or Smx) estmate led to hgher accuracy than ether the pooled or the sample group covarance matrces. In terms of how senstve the mxture covarance results were to the choce of the tranng and test sets, t s far to say that the S mx standard devatons were smlar to the pooled estmate. Table 2 shows the results of the facal expresson recognton. For more than 20 components when the sample group covarance estmate became sngular, the mxture covarance estmate reached hgher recognton rates than the pooled covarance estmate. Agan, regardng the computed standard devatons, the S mx estmate showed to be as senstve to the choce of the tranng and test sets as the other two estmates. Table 1 Face recognton results PCA Sgroup Spooled Smx components Tranng Test Tranng Test Tranng Test 4 99.5 (0.4) 51.6 (4.4) 73.3 (3.1) 59.5 (3.0) 90.1 (2.1) 70.8 (3.2) 10 96.6 (1.2) 88.4 (1.4) 99.4 (0.5) 92.0 (1.5) 20 99.2 (0.6) 91.8 (1.8) 100.0 (0.1) 94.5 (1.7) 30 99.9 (0.2) 94.7 (1.7) 100.0 (0.0) 95.9 (1.5) 40 100.0 (0.0) 95.4 (1.5) 100.0 (0.0) 96.2 (1.6) 50 100.0 (0.0) 95.7 (1.2) 100.0 (0.0) 96.4 (1.5) 60 100.0 (0.0) 95.0 (1.6) 100.0 (0.0) 95.8 (1.6) 70 100.0 (0.0) 94.9 (1.6) 100.0 (0.0) 95.4 (1.6)
2164 C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 Table 2 Facal expresson recognton results PCA Sgroup Spooled Smx components Tranng Test Tranng Test Tranng Test 5 41.5 (4.2) 20.6 (3.9) 32.3 (3.0) 21.6 (3.8) 34.9 (3.3) 21.3 (4.1) 10 76.3 (3.6) 38.8 (5.6) 49.6 (3.9) 26.5 (6.8) 58.5 (3.7) 27.9 (5.6) 15 99.7 (0.5) 64.3 (6.4) 69.1 (3.6) 44.4 (5.3) 82.9 (2.9) 49.7 (7.7) 20 81.2 (2.6) 55.9 (7.7) 91.4 (2.8) 61.3 (7.1) 25 86.9 (2.8) 64.9 (6.9) 94.8 (2.2) 68.3 (5.1) 30 91.9 (1.7) 70.1 (7.8) 96.8 (1.3) 72.3 (6.2) 35 94.3 (1.7) 72.0 (7.4) 97.7 (1.1) 75.6 (5.5) 40 95.9 (1.4) 75.6 (7.1) 98.3 (1.1) 77.2 (5.7) 45 96.7 (1.3) 78.4 (6.5) 98.6 (0.8) 79.1 (5.4) 50 97.6 (1.0) 79.4 (5.8) 99.2 (0.7) 81.0 (6.6) 55 98.5 (0.9) 81.6 (6.6) 99.5 (0.6) 82.8 (6.3) 60 99.1 (0.8) 82.1 (5.9) 99.6 (0.6) 83.6 (7.2) 65 99.5 (0.6) 83.3 (5.5) 99.8 (0.4) 84.5 (6.2) Table 3 The average (wth standard devatons) of the optmum mxture parameters PCA Lnear mxture parameter components Face Facal expresson 10 0.58 (0.25) 0.76 (0.19) 20 0.65 (0.21) 0.49 (0.15) 30 0.71 (0.18) 0.56 (0.15) 40 0.77 (0.16) 0.67 (0.15) 50 0.82 (0.13) 0.77 (0.11) 60 0.85 (0.11) 0.85 (0.09) Another result revealed by these experments s related to the optmum mxture parameters w. Table 3 shows the average (wth standard devatons) of the selected mxture parameter w over the common face and facal expresson PCA components. It can be seen that as the dmenson of the feature space ncreases, the average and standard devaton of the mxture parameter w n all but one experment ncreases and decreases respectvely, makng the mxture covarance of each class (S mx ) more smlar to the pooled covarance (S pooled ) than the sample group one (S ). Although ths behavour depends on the applcatons consdered, t suggests that n both pre-processed mage classfcaton tasks the sparseness of the sample group covarance matrx could nfluence ts lnear combnaton to the pooled covarance matrx. In other words, t seems that when the group sample szes are small compared wth the dmenson of the feature space, the pooled nformaton s more relable than that provded sparsely by each group. Research s currently beng done n order to understand and prove ths behavour under certan constrants (Thomaz et al., 2001, 2002). 7. Conclusons Ths paper used an estmate for the sample group covarance matrces, here called mxture covarance matrces, gven by an approprate lnear combnaton of the sample group covarance matrx and the pooled covarance one. The mxture covarance matrces have the same rank as the pooled estmate, whle allowng a dfferent estmate for each group. Extensve experments were carred out to evaluate ths approach on two bometrc recognton tasks: face recognton and facal expresson recognton. A Gaussan maxmum probablty classfer was bult usng the mxture estmate and the typcal sample group and pooled estmates. In both tasks the mxture covarance estmate acheved the hghest classfcaton performance. Regardng the senstvty to the choce of the tranng and test sets, the mxture covarance matrces gave a smlar performance to the other two usual estmates. The experments were carred out usng wellframed mages wthout normalsaton, n order to
C.E. Thomaz et al. / Pattern Recognton Letters 24 (2003) 2159 2165 2165 compare the performance of the dfferent covarance estmators. We would expect the smlar comparatve results, perhaps wth an overall mprovement n performance, would be obtaned by usng technques such as normalsaton or by ncorporatng other mage features. The results presented n ths work suggested that n both pre-processed mage classfcaton tasks the sparseness of the sample group covarance matrx could nfluence ts lnear combnaton to the pooled covarance matrx. Further work s beng undertaken to study ths relatonshp. Acknowledgements The authors would lke to thank the referees for ther constructve comments whch helped to mprove ths paper. The frst author was partally supported by the Brazlan Government Agency CAPES under the grant no. 1168/99-1. References Bshop, C.M., 1997. Neural Networks for Pattern Recognton. Oxford Unversty Press, New York. Fukunaga, K., 1990. Introducton to Statstcal Pattern Recognton. Academc Press, Boston. Golub, G.H., Van Loan, C.F., 1989. Matrx Computatons. Johns Hopkns Unversty Press, Baltmore. Hoffbeck, J.P., Landgrebe, D.A., 1996. Covarance matrx estmaton and classfcaton wth lmted tranng data. IEEE Trans. Pattern Anal. Machne Intell. 18 (7), 763 767. Johnson, R.A., Wchern, D.W., 1998. Appled Multvarate Statstcal Analyss. Prentce Hall, New Jersey. Krby, M., Srovch, L., 1990. Applcaton of the Karhunen Loeve procedure for the characterzaton of human faces. IEEE Trans. Pattern Anal. Machne Intell. 12 (1), 103 108. Lu, C., Wechsler, H., 2000. Learnng the face space representaton and recognton. In: Proc. 15th Internat. Conf. on Pattern Recognton, ICPRÕ2000. Lyons, M.J., Budynek, J., Akamatsu, S., 1999. Automatc classfcaton of sngle facal mages. IEEE Trans. Pattern Anal. Machne Intell. 21 (12), 1357 1362. Magnus, J.R., Neudecker, H., 1999. Matrx Dfferental Calculus wth Applcatons n Statstcs and Econometrcs. John Wley & Sons Ltd., Chchester. Thomaz, C.E., Glles, D.F., Fetosa, R.Q., 2001. Small sample problem n bayes plug-n classfer for mage recognton. In: Proc. Internat. Conf. on Image and Vson Computng New Zealand. pp. 295 300. Thomaz, C.E., Glles, D.F., Fetosa, R.Q., 2002. A new quadratc classfer appled to bometrc recognton. In: Proc. Post-ECCV Workshop on Bometrc Authentcaton, Sprnger-Verlag LNCS 2359, Copenhagen, Denmark, pp. 186 196. Turk, M., Pentland, A., 1991. Egenfaces for recognton. J. Cogntve Neurosc. 3, 71 86. Zhao, W., Chellappa, R., Krshnaswamy, A., 1998. Dscrmnant analyss of prncpal components for face recognton. In: Proc. 2nd Internat. Conf. on Automatc Face and Gesture Recognton. pp. 336 341.