Single and multiple stage classifiers implementing logistic discrimination



Similar documents
Forecasting the Direction and Strength of Stock Market Movement

L10: Linear discriminants analysis

What is Candidate Sampling

Statistical Methods to Develop Rating Models

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

The OC Curve of Attribute Acceptance Plans

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Logistic Regression. Steve Kroon

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Gender Classification for Real-Time Audience Analysis System

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

How To Calculate The Accountng Perod Of Nequalty

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

RequIn, a tool for fast web traffic inference

Lecture 2: Single Layer Perceptrons Kevin Swingler

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Detecting Credit Card Fraud using Periodic Features

A Performance Analysis of View Maintenance Techniques for Data Warehouses

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Calculating the high frequency transmission line parameters of power cables

Lecture 5,6 Linear Methods for Classification. Summary

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Calculation of Sampling Weights

Traffic-light a stress test for life insurance provisions

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Binomial Link Functions. Lori Murray, Phil Munz

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

ONE of the most crucial problems that every image

Performance Analysis and Coding Strategy of ECOC SVMs

A DATA MINING APPLICATION IN A STUDENT DATABASE

Regression Models for a Binary Response Using EXCEL and JMP

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Optimal Customized Pricing in Competitive Settings

A Multi-mode Image Tracking System Based on Distributed Fusion

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

An Alternative Way to Measure Private Equity Performance

Analysis of Premium Liabilities for Australian Lines of Business

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

Learning from Multiple Outlooks

Statistical Approach for Offline Handwritten Signature Verification

1 Example 1: Axis-aligned rectangles

Fast Fuzzy Clustering of Web Page Collections

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Credit Limit Optimization (CLO) for Credit Cards

Return decomposing of absolute-performance multi-asset class portfolios. Working Paper - Nummer: 16

Prediction of Stock Market Index Movement by Ten Data Mining Techniques

Statistical algorithms in Review Manager 5

Transition Matrix Models of Consumer Credit Ratings

Effective wavelet-based compression method with adaptive quantization threshold and zerotree coding

Project Networks With Mixed-Time Constraints

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Improved SVM in Cloud Computing Information Mining

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Design and Development of a Security Evaluation Platform Based on International Standards

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION

Mining Multiple Large Data Sources

Survival analysis methods in Insurance Applications in car insurance contracts

A practical approach to combine data mining and prognostics for improved predictive maintenance

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

The Current Employment Statistics (CES) survey,

This circuit than can be reduced to a planar circuit

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

Adaptive Fractal Image Coding in the Frequency Domain

CHAPTER 14 MORE ABOUT REGRESSION

Recurrence. 1 Definitions and main statements

Politecnico di Torino. Porto Institutional Repository

STANDING WAVE TUBE TECHNIQUES FOR MEASURING THE NORMAL INCIDENCE ABSORPTION COEFFICIENT: COMPARISON OF DIFFERENT EXPERIMENTAL SETUPS.

Financial Instability and Life Insurance Demand + Mahito Okura *

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Decision Tree Model for Count Data


Efficient Project Portfolio as a tool for Enterprise Risk Management

Transcription:

Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga, 6681, CEP 90619-900, Porto Alegre, RS, Brazl helorb@pucrs.br 2 Centro Estadual de Pesqusas em Sensoramento Remoto e Meteorologa, UFRGS Av. Bento Gonçalves, 9500, CEP 91501-970, RS, Brazl {dalter, vctor.haertel}@ufrgs.br Abstract. The logstc dscrmnaton technque can be regarded as a partally parametrc approach to pattern recognton. Ths method s general and robust because t doesn t make assumptons on the underlyng dstrbuton of data. Moreover, t requres the estmaton of fewer parameters than some better-known procedures such as the Gaussan maxmum lkelhood dscrmnator. In ths paper, two dfferent approaches to mplement logstc dscrmnaton are presented. Experments were performed usng hgh-dmensonal mage data acqured by the AVIRIS system, showng a test area whch ncludes a number of classes spectrally very smlar. Results are presented and dscussed Keywords: remote sensng, mage processng, logstc dscrmnaton, sensoramento remoto, processamento de magens, dscrmnação logístca. 1. Introducton Statstcal pattern recognton appled to dgtal mage classfcaton has been extensvely dscussed n the scentfc lterature. Accordng to Jan et al (2000), from 1979 to 2000, 350 artcles dealng wth pattern recognton were publshed n IEEE Transactons on Pattern Analyss and Machne Intellgence. Out of ths total, 85% were related to statstcal pattern recognton. In the statstcal approach, each pattern s regarded as a p-dmensonal random vector, where p s the number of varables used n classfcaton. A number of statstcal methods for classfcaton have been proposed, each one presentng advantages and dsadvantages. Accordng Bttencourt and Clarke (2003) technques requrng assumptons on the functonal shape of the varables n the feature space nvolve parameter estmaton, and are therefore termed parametrc. Dependng on the classfcaton method, the number of classes present and the number of varables used, the number of parameters to be estmated may vary substantally. In the Gaussan maxmum-lkelhood method, par example, the underlyng probablty dstrbutons are assumed to be multvarate Normal and the number of parameters to be estmated can be very large. Haertel and Landgrebe (1999) states that that the estmaton of the wthn-class covarance matrces s one of the most dffcult problems n dealng wth hgh-dmensonal data, especally due the fact that the number of avalable tranng samples s frequently very lmted. Methods based on logstc dscrmnaton have advantages compared to other parametrc methods because the assumptons requred are consderably weaker and the number of parameters to be estmated s smaller. In ths paper two possble approaches to pattern classfcaton usng based on logstc model are nvestgated. The frst one sngle stage mplements the concepts developed n logstc dscrmnaton, as derved from the 6431

multnomal logstc regresson model. The second one multple stage conssts n successve applcatons of a tradtonal bnary logstc model, usng a dstance measure to select the classes to be dealt wth at each stage. In the followng sectons these two logstc models are llustrated usng AVIRIS hyperspectral mage data. The results are presented and dscussed. 2. Logstc Dscrmnaton Logstc Dscrmnaton was orgnally proposed by Bttencourt and Clarke (2000) n the classfcaton of remote sensng mage data of natural scenes. The man advantage of ths approach as compared wth the more tradtonal quadratc classfers such as the Gaussan maxmum lkelhood les n the smaller number of parameters to be estmated. In ths case, the probablty that a pattern x belongs to the class w can be drectly estmated by P ( x) exp w = k 1 1+ j= 1 T ( β 0 + β x) T ( β + β x) Here, β and 0 β are the model parameters, wth β termed the ntercept and 0 β s a vector of parameters assocated wth the p characterstcs of the vector x. The logstc model requres the estmaton of k-1 vectors nvolvng the parameters β, correspondng to the k-1 classes present n the mage. The k-th class s taken as a bass, from whch the natural log of the rato of the two probabltes become lnear functons of the parameters. Ths logarthm s known as the logt functon. The reducton n the number of parameters plays an mportant role whenever the number of tranng samples avalable s small compared to the data dmensonalty. Therefore, the logstc dscrmnaton may be of partcular nterest n remote sensng hyperspectral mage data classfcaton. Another possble approach to deal wth the small sample sze problem s the mplementaton of the multple stage approach n the classfcaton process. In ths case, several tradtonal bnary logstc regressons are used at each stage, each one dealng wth a par of classes at a tme. Decson tree classfer s a type of herarchcal classfer that has been nvestgated by several authors such as Moraes (2005) and Bttencourt and Clarke (2003b). 3. Sngle Stage and Multple Stage Classfers An example of the tradtonal sngle stage classfer methodology appled to a four classes problem s llustrated n Fgure 1. The patterns to be labeled (pxels) are ntally clustered n a sngle group. Decson functons are then estmated for each ndvdual class under consderaton. Next, each ndvdual pattern s nserted nto the decsons functons and labeled accordng to the wnner. exp j0 j Fgure 1 Structure of sngle stage classfer 6432

In the context of a sngle stage classfer, a problem that frequently arses deals wth the selecton of the varables to be used by the classfer. In order to prevent that the number of parameters to be estmated becomes too large, t s advsable to use a sub-set of varables, ncludng the ones that show a hgher dscrmnant power among the classes nvolved only. In the sngle stage approach all classes are consdered smultaneously a fact that turns ths selecton nto a dffcult problem. An another possble approach to the multple class cases conssts n the use of multple stage classfers, or decson tree classfers, where the global problem s subdvded n smallest local problems. In ths approach, the classfcaton process of each pxel consders n each stage only a sub-set of classes. However, as the number of classes ncreases, so does the number of possble structures for the decson tree, turnng the selecton of the optmal tree structure nto a dffcult problem. A possble approach to solve ths problem was proposed by Moraes (2005), whch conssts n adoptng a pre-defned bnary structure to the tree. In that work, the author proves the superorty of the defned structure over all others possble structures of bnary tree classfers. Fgure 2 Structure of multple stage classfer In the case of herarchcal dscrmnaton between multple classes, the multple stage classfer structure s then defned as follows. Frst, at each stage of the classfer, statstcal dstances separatng two classes at a tme n node d are estmated by makng use of the avalable tranng samples. In ths study the Bhattacharyya dstance (Moraes, 2005) was used. The par of classes showng the largest dstance s then selected to defne the two descendng nodes. The tranng samples avalable to the selected classes are entrely allocated nto the correspondng descendng nodes. The tranng samples belongng to the remanng classes n the parent node are classfed nto one of the descendng nodes accordng wth a decson rule. Ths process s then repeated untl the termnal nodes nodes ncludng samples of a sngle class are reached. Once the structure of the bnary multple stage classfer s defned, as llustrated n the example shown n Fgure 2. The actual classfcaton of ndvdual pxels n the mage data can start. In each node pxels are compared to two classes only, namely, the par prevously selected by the largest statstcal dstance crteron, allowng the straght applcaton of the formal logstc dscrmnaton method. Ths process s appled sequentally across the tree, untl the pxel reaches a termnal node n whch case t s labeled accordng wth ths node. 6433

4. Results Hyper spectral mage data acqured by the AVIRIS system wth known ground truth was used to compare the effcency of the sngle stage classfer aganst the accuracy provded by the multple stage classfer approach. Out of the 224 bands avalable by Avrs sensor only 190 were consdered n the experments, due to the nosy effects of the atmospherc water vapor n the remanng bands. Fgure 3 presents the study area and the Fgure 4 shows the spectral sgnature of the four classes that were mplemented nto the classfer. A sample of 95 spectral bands, chosen systematcally, was used to reduce the computatonal cost. The tranng sample sze was chosen equal to 500 pxels per class. The same set of samples was also used for testng purposes (re-substtuton method). All procedures of estmaton were made n the software SPSS release 13.0 Fgure 3 Study area (composte color of AVIRIS mage) and ground truth Fgure 4 Spectral sgnature to the four classes (corn notll; corn mn; corn and soy notll) In the experments, the sngle stage classfer yelded 73,7% mean accuracy. As the classes n the experments are spectrally very smlar, there was confuson between them especally among the classes corn, corn mn and corn notll. The soy notll class s the easest 6434

to dscrmnate. The Table 1 shows the results. The software SPSS needed about 10 mnutes to realze ths task n a PC AMD Athlon 3GHz wth 512Mb. Table 1 Classfcaton table usng the logstc dscrmnaton sngle stage classfer Predcted Observed Corn Corn mn Corn notll Soy notll Percent Correct Corn 382 59 46 13 76,4% Corn mn 92 334 50 24 66,8% Corn notll 25 62 362 51 72,4% Soy notll 37 26 41 396 79,2% Overall Percentage 73,7% In the case of multple stage classfer, there are n ths experments 96 parameters to be estmated at each node of decson tree, totalzng seven decson functons and 672 parameters. In the tree structured algorthm mplemented n ths experment, there are eght dfferent paths across the tree that a pxel can follow to reach a termnal node. The SPSS software doesn t make ths procedure automatcally, so a lttle program was wrtten n SPSS language to solve ths problem. There was necessary just few seconds to estmate the parameters at each stage, because the estmaton process at the tradtonal bnary logstc model s faster than multnomal logstc dscrmnaton. The pxels dstrbuton among the nodes, usng a 2000 pxels valdaton sample, s shown n Fgure 5. Node 1 {1,2,3,4} 977 pxels 1023 pxels Node 2 {1,2,3} Node 3 {1,2,4} 266 pxels 711 pxels 347 pxels 676 pxels Node 4 {1,2} Node 5 {2,3} Node 6 {1,2} Node 7 {1,4} {1} {2} {2} {3} {1} {2} {1} {4} 155 111 194 517 119 228 215 461 Number of pxels at termnal nodes Fgure 5 The process of classfcaton n multple stage classfer The mean accuracy yelded by the multple stage classfer was slghtly better n comparson of sngle stage. The fnal classfcaton table obtaned by usng the valdaton sample s shown n Table 2. 6435

Table 2 Classfcaton table usng the logstc dscrmnaton multple stage classfer Predcted Observed Corn Corn mn Corn notll Soy notll Percent Correct Corn 392 50 20 38 78,4% Corn mn 31 368 85 16 73,6% Corn notll 24 78 382 16 76,4% Soy notll 42 37 30 391 78,2% Overall Percentage 76,7% 5. Fnal Remarks The results found allow to make the followng fnal remarks: a) The multple stage classfer presents better results than sngle one. The dfference between them n the overall performance s not so great and the results agree wth Moraes (2005) that found better results when a classfcaton trees was used. b) Regardng the processng tme, the multple stage classfer reached the estmates faster than sngle stage method. c) Based on these results we suggest that the herarchcal classfers can be used as an alternatve to the sngle stage classfers. d) It s recommendable dvde a classfcaton complex problem n several smpler ones. 6. Acknowledgments The authors acknowledge to CNPq Edtal Unversal for ts support. References Bttencourt, H. R. ; Clarke, R. T. Estudo comparatvo entre o modelo de dscrmnação logístco e o método da máxma verossmlhança gaussana. In: Smposo Latnoamercano de Percepcón Remota, 9, 2000, Puerto Iguazú, Argentna. Memoras... Luján, SELPER, 2000. CD-ROM. Dsponível em: <http://www.selper.org/trabajos/pro009.pdf>. Bttencourt, H. R. ; Clarke, R. T.. Logstc Dscrmnaton Between Classes wth Nearly Equal Spectral Response n Hgh Dmensonalty. In: 2003 IEEE Internatonal Geoscence and Remote Sensng Symposum, 2003, Toulouse, France. Annals 2003 IEEE IGARSS. Pscataway, NJ - USA: IEEE Operatons Center, 2003a. Bttencourt, H. R. ; Clarke, R. T.. Use of Classfcaton and Regresson Trees (CART) to Classfy Remotely- Sensed Dgtal Images. In: 2003 IEEE Internatonal Geoscence and Remote Sensng Symposum, 2003, Toulouse, France. Annals 2003 IEEE IGARSS. Pscataway, NJ - USA: IEEE Operatons Center, 2003b. Haertel, V.; Landgrebe, D. On the classfcaton of classes wth nearly equal spectral response n remote sensng hyperspectral mage data. IEEE Transactons on Geoscence and Remote Sensng, v. 37, n. 5., pp. 2374-2386, 1999. Jan, A. K.; Dun, R.P.; Mao J. Statstcal Pattern Recognton: A Revew. IEEE Transactons on Pattern Analyss and Machne Intellgence, v. 22, n. 1, pp. 04-37, 2000. Moraes, D. A. O. Extração de Feções em Dados Imagem com Alta Dmensão por Otmzação da Dstanca de Bhattacharyya em um Classfcador de Decsão em Arvore. Dssertação de Mestrado em Sensoramento Remoto - PPGSR, Unversdade Federal do Ro Grande do Sul, CNPq, 99 p., 2005. 6436