Single and multiple stage classifiers implementing logistic discrimination

Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga, 6681, CEP 90619-900, Porto Alegre, RS, Brazl helorb@pucrs.br 2 Centro Estadual de Pesqusas em Sensoramento Remoto e Meteorologa, UFRGS Av. Bento Gonçalves, 9500, CEP 91501-970, RS, Brazl {dalter, vctor.haertel}@ufrgs.br Abstract. The logstc dscrmnaton technque can be regarded as a partally parametrc approach to pattern recognton. Ths method s general and robust because t doesn t make assumptons on the underlyng dstrbuton of data. Moreover, t requres the estmaton of fewer parameters than some better-known procedures such as the Gaussan maxmum lkelhood dscrmnator. In ths paper, two dfferent approaches to mplement logstc dscrmnaton are presented. Experments were performed usng hgh-dmensonal mage data acqured by the AVIRIS system, showng a test area whch ncludes a number of classes spectrally very smlar. Results are presented and dscussed Keywords: remote sensng, mage processng, logstc dscrmnaton, sensoramento remoto, processamento de magens, dscrmnação logístca. 1. Introducton Statstcal pattern recognton appled to dgtal mage classfcaton has been extensvely dscussed n the scentfc lterature. Accordng to Jan et al (2000), from 1979 to 2000, 350 artcles dealng wth pattern recognton were publshed n IEEE Transactons on Pattern Analyss and Machne Intellgence. Out of ths total, 85% were related to statstcal pattern recognton. In the statstcal approach, each pattern s regarded as a p-dmensonal random vector, where p s the number of varables used n classfcaton. A number of statstcal methods for classfcaton have been proposed, each one presentng advantages and dsadvantages. Accordng Bttencourt and Clarke (2003) technques requrng assumptons on the functonal shape of the varables n the feature space nvolve parameter estmaton, and are therefore termed parametrc. Dependng on the classfcaton method, the number of classes present and the number of varables used, the number of parameters to be estmated may vary substantally. In the Gaussan maxmum-lkelhood method, par example, the underlyng probablty dstrbutons are assumed to be multvarate Normal and the number of parameters to be estmated can be very large. Haertel and Landgrebe (1999) states that that the estmaton of the wthn-class covarance matrces s one of the most dffcult problems n dealng wth hgh-dmensonal data, especally due the fact that the number of avalable tranng samples s frequently very lmted. Methods based on logstc dscrmnaton have advantages compared to other parametrc methods because the assumptons requred are consderably weaker and the number of parameters to be estmated s smaller. In ths paper two possble approaches to pattern classfcaton usng based on logstc model are nvestgated. The frst one sngle stage mplements the concepts developed n logstc dscrmnaton, as derved from the 6431

multnomal logstc regresson model. The second one multple stage conssts n successve applcatons of a tradtonal bnary logstc model, usng a dstance measure to select the classes to be dealt wth at each stage. In the followng sectons these two logstc models are llustrated usng AVIRIS hyperspectral mage data. The results are presented and dscussed. 2. Logstc Dscrmnaton Logstc Dscrmnaton was orgnally proposed by Bttencourt and Clarke (2000) n the classfcaton of remote sensng mage data of natural scenes. The man advantage of ths approach as compared wth the more tradtonal quadratc classfers such as the Gaussan maxmum lkelhood les n the smaller number of parameters to be estmated. In ths case, the probablty that a pattern x belongs to the class w can be drectly estmated by P ( x) exp w = k 1 1+ j= 1 T ( β 0 + β x) T ( β + β x) Here, β and 0 β are the model parameters, wth β termed the ntercept and 0 β s a vector of parameters assocated wth the p characterstcs of the vector x. The logstc model requres the estmaton of k-1 vectors nvolvng the parameters β, correspondng to the k-1 classes present n the mage. The k-th class s taken as a bass, from whch the natural log of the rato of the two probabltes become lnear functons of the parameters. Ths logarthm s known as the logt functon. The reducton n the number of parameters plays an mportant role whenever the number of tranng samples avalable s small compared to the data dmensonalty. Therefore, the logstc dscrmnaton may be of partcular nterest n remote sensng hyperspectral mage data classfcaton. Another possble approach to deal wth the small sample sze problem s the mplementaton of the multple stage approach n the classfcaton process. In ths case, several tradtonal bnary logstc regressons are used at each stage, each one dealng wth a par of classes at a tme. Decson tree classfer s a type of herarchcal classfer that has been nvestgated by several authors such as Moraes (2005) and Bttencourt and Clarke (2003b). 3. Sngle Stage and Multple Stage Classfers An example of the tradtonal sngle stage classfer methodology appled to a four classes problem s llustrated n Fgure 1. The patterns to be labeled (pxels) are ntally clustered n a sngle group. Decson functons are then estmated for each ndvdual class under consderaton. Next, each ndvdual pattern s nserted nto the decsons functons and labeled accordng to the wnner. exp j0 j Fgure 1 Structure of sngle stage classfer 6432

In the context of a sngle stage classfer, a problem that frequently arses deals wth the selecton of the varables to be used by the classfer. In order to prevent that the number of parameters to be estmated becomes too large, t s advsable to use a sub-set of varables, ncludng the ones that show a hgher dscrmnant power among the classes nvolved only. In the sngle stage approach all classes are consdered smultaneously a fact that turns ths selecton nto a dffcult problem. An another possble approach to the multple class cases conssts n the use of multple stage classfers, or decson tree classfers, where the global problem s subdvded n smallest local problems. In ths approach, the classfcaton process of each pxel consders n each stage only a sub-set of classes. However, as the number of classes ncreases, so does the number of possble structures for the decson tree, turnng the selecton of the optmal tree structure nto a dffcult problem. A possble approach to solve ths problem was proposed by Moraes (2005), whch conssts n adoptng a pre-defned bnary structure to the tree. In that work, the author proves the superorty of the defned structure over all others possble structures of bnary tree classfers. Fgure 2 Structure of multple stage classfer In the case of herarchcal dscrmnaton between multple classes, the multple stage classfer structure s then defned as follows. Frst, at each stage of the classfer, statstcal dstances separatng two classes at a tme n node d are estmated by makng use of the avalable tranng samples. In ths study the Bhattacharyya dstance (Moraes, 2005) was used. The par of classes showng the largest dstance s then selected to defne the two descendng nodes. The tranng samples avalable to the selected classes are entrely allocated nto the correspondng descendng nodes. The tranng samples belongng to the remanng classes n the parent node are classfed nto one of the descendng nodes accordng wth a decson rule. Ths process s then repeated untl the termnal nodes nodes ncludng samples of a sngle class are reached. Once the structure of the bnary multple stage classfer s defned, as llustrated n the example shown n Fgure 2. The actual classfcaton of ndvdual pxels n the mage data can start. In each node pxels are compared to two classes only, namely, the par prevously selected by the largest statstcal dstance crteron, allowng the straght applcaton of the formal logstc dscrmnaton method. Ths process s appled sequentally across the tree, untl the pxel reaches a termnal node n whch case t s labeled accordng wth ths node. 6433

4. Results Hyper spectral mage data acqured by the AVIRIS system wth known ground truth was used to compare the effcency of the sngle stage classfer aganst the accuracy provded by the multple stage classfer approach. Out of the 224 bands avalable by Avrs sensor only 190 were consdered n the experments, due to the nosy effects of the atmospherc water vapor n the remanng bands. Fgure 3 presents the study area and the Fgure 4 shows the spectral sgnature of the four classes that were mplemented nto the classfer. A sample of 95 spectral bands, chosen systematcally, was used to reduce the computatonal cost. The tranng sample sze was chosen equal to 500 pxels per class. The same set of samples was also used for testng purposes (re-substtuton method). All procedures of estmaton were made n the software SPSS release 13.0 Fgure 3 Study area (composte color of AVIRIS mage) and ground truth Fgure 4 Spectral sgnature to the four classes (corn notll; corn mn; corn and soy notll) In the experments, the sngle stage classfer yelded 73,7% mean accuracy. As the classes n the experments are spectrally very smlar, there was confuson between them especally among the classes corn, corn mn and corn notll. The soy notll class s the easest 6434

to dscrmnate. The Table 1 shows the results. The software SPSS needed about 10 mnutes to realze ths task n a PC AMD Athlon 3GHz wth 512Mb. Table 1 Classfcaton table usng the logstc dscrmnaton sngle stage classfer Predcted Observed Corn Corn mn Corn notll Soy notll Percent Correct Corn 382 59 46 13 76,4% Corn mn 92 334 50 24 66,8% Corn notll 25 62 362 51 72,4% Soy notll 37 26 41 396 79,2% Overall Percentage 73,7% In the case of multple stage classfer, there are n ths experments 96 parameters to be estmated at each node of decson tree, totalzng seven decson functons and 672 parameters. In the tree structured algorthm mplemented n ths experment, there are eght dfferent paths across the tree that a pxel can follow to reach a termnal node. The SPSS software doesn t make ths procedure automatcally, so a lttle program was wrtten n SPSS language to solve ths problem. There was necessary just few seconds to estmate the parameters at each stage, because the estmaton process at the tradtonal bnary logstc model s faster than multnomal logstc dscrmnaton. The pxels dstrbuton among the nodes, usng a 2000 pxels valdaton sample, s shown n Fgure 5. Node 1 {1,2,3,4} 977 pxels 1023 pxels Node 2 {1,2,3} Node 3 {1,2,4} 266 pxels 711 pxels 347 pxels 676 pxels Node 4 {1,2} Node 5 {2,3} Node 6 {1,2} Node 7 {1,4} {1} {2} {2} {3} {1} {2} {1} {4} 155 111 194 517 119 228 215 461 Number of pxels at termnal nodes Fgure 5 The process of classfcaton n multple stage classfer The mean accuracy yelded by the multple stage classfer was slghtly better n comparson of sngle stage. The fnal classfcaton table obtaned by usng the valdaton sample s shown n Table 2. 6435

Table 2 Classfcaton table usng the logstc dscrmnaton multple stage classfer Predcted Observed Corn Corn mn Corn notll Soy notll Percent Correct Corn 392 50 20 38 78,4% Corn mn 31 368 85 16 73,6% Corn notll 24 78 382 16 76,4% Soy notll 42 37 30 391 78,2% Overall Percentage 76,7% 5. Fnal Remarks The results found allow to make the followng fnal remarks: a) The multple stage classfer presents better results than sngle one. The dfference between them n the overall performance s not so great and the results agree wth Moraes (2005) that found better results when a classfcaton trees was used. b) Regardng the processng tme, the multple stage classfer reached the estmates faster than sngle stage method. c) Based on these results we suggest that the herarchcal classfers can be used as an alternatve to the sngle stage classfers. d) It s recommendable dvde a classfcaton complex problem n several smpler ones. 6. Acknowledgments The authors acknowledge to CNPq Edtal Unversal for ts support. References Bttencourt, H. R. ; Clarke, R. T. Estudo comparatvo entre o modelo de dscrmnação logístco e o método da máxma verossmlhança gaussana. In: Smposo Latnoamercano de Percepcón Remota, 9, 2000, Puerto Iguazú, Argentna. Memoras... Luján, SELPER, 2000. CD-ROM. Dsponível em: <http://www.selper.org/trabajos/pro009.pdf>. Bttencourt, H. R. ; Clarke, R. T.. Logstc Dscrmnaton Between Classes wth Nearly Equal Spectral Response n Hgh Dmensonalty. In: 2003 IEEE Internatonal Geoscence and Remote Sensng Symposum, 2003, Toulouse, France. Annals 2003 IEEE IGARSS. Pscataway, NJ - USA: IEEE Operatons Center, 2003a. Bttencourt, H. R. ; Clarke, R. T.. Use of Classfcaton and Regresson Trees (CART) to Classfy Remotely- Sensed Dgtal Images. In: 2003 IEEE Internatonal Geoscence and Remote Sensng Symposum, 2003, Toulouse, France. Annals 2003 IEEE IGARSS. Pscataway, NJ - USA: IEEE Operatons Center, 2003b. Haertel, V.; Landgrebe, D. On the classfcaton of classes wth nearly equal spectral response n remote sensng hyperspectral mage data. IEEE Transactons on Geoscence and Remote Sensng, v. 37, n. 5., pp. 2374-2386, 1999. Jan, A. K.; Dun, R.P.; Mao J. Statstcal Pattern Recognton: A Revew. IEEE Transactons on Pattern Analyss and Machne Intellgence, v. 22, n. 1, pp. 04-37, 2000. Moraes, D. A. O. Extração de Feções em Dados Imagem com Alta Dmensão por Otmzação da Dstanca de Bhattacharyya em um Classfcador de Decsão em Arvore. Dssertação de Mestrado em Sensoramento Remoto - PPGSR, Unversdade Federal do Ro Grande do Sul, CNPq, 99 p., 2005. 6436