Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution



Similar documents
An Ensemble Classification Framework to Evolving Data Streams

Multi-agent System for Custom Relationship Management with SVMs Tool

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

What is Candidate Sampling

Clustering based Two-Stage Text Classification Requiring Minimal Training Data

An Efficient Job Scheduling for MapReduce Clusters

A Resources Allocation Model for Multi-Project Management

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

An Interest-Oriented Network Evolution Mechanism for Online Communities

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Forecasting the Direction and Strength of Stock Market Movement

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

Off-line and on-line scheduling on heterogeneous master-slave platforms

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

DEFINING %COMPLETE IN MICROSOFT PROJECT

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

Lecture 2: Single Layer Perceptrons Kevin Swingler

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

Loop Parallelization

An Alternative Way to Measure Private Equity Performance

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Comparison of workflow software products

2) A single-language trained classifier: one. classifier trained on documents written in

Agglomeration economies in manufacturing industries: the case of Spain

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

GRADIENT METHODS FOR BINARY INTEGER PROGRAMMING

MACHINE VISION SYSTEM FOR SPECULAR SURFACE INSPECTION: USE OF SIMULATION PROCESS AS A TOOL FOR DESIGN AND OPTIMIZATION

Project Networks With Mixed-Time Constraints

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

An Analysis of Dynamic Severity and Population Size

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Gender Classification for Real-Time Audience Analysis System

CHAPTER 14 MORE ABOUT REGRESSION

A Secure Password-Authenticated Key Agreement Using Smart Cards

Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Traffic-light a stress test for life insurance provisions

The Application of Fractional Brownian Motion in Option Pricing

Design and Development of a Security Evaluation Platform Based on International Standards

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

Airport Investment Risk Assessment under Uncertainty

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

On-Line Trajectory Generation: Nonconstant Motion Constraints

Recurrence. 1 Definitions and main statements

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

XAC08-6 Professional Project Management

Learning with Imperfections A Multi-Agent Neural-Genetic Trading System. with Differing Levels of Social Learning

Study of Cloud Services Recommendation Model Based on Chord Ring

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

An Adaptive and Distributed Clustering Scheme for Wireless Sensor Networks

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

A DATA MINING APPLICATION IN A STUDENT DATABASE

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

The OC Curve of Attribute Acceptance Plans

Intra-day Trading of the FTSE-100 Futures Contract Using Neural Networks With Wavelet Encodings

L10: Linear discriminants analysis

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Fast Incremental Spectral Clustering for Large Data Sets

How To Calculate The Accountng Perod Of Nequalty

Realistic Image Synthesis

A Multi-mode Image Tracking System Based on Distributed Fusion

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

Expressive Negotiation over Donations to Charities

Transcription:

Neura Networ-based Coonoscopc Dagnoss Usng On-ne Learnng and Dfferenta Evouton George D. Magouas, Vasss P. Paganaos * and Mchae N. Vrahats * Department of Informaton Systems and Computng, Brune Unversty, Uxbrdge UB8 3PH, Unted Kngdom Phone: +44-895-74000, Fax: +44-895-5686 ema:george.magouas@brune.ac.u * Department of Mathematcs and Unversty of Patras Artfca Integence Research Center (UPAIRC), Unversty of Patras, GR-60 Patras, Greece Phone: +30-6-997374, Fax: ++30 6 99965 ema:{vpp,vrahats}@math.upatras.gr ABSTRACT: In ths paper, on-ne tranng of neura networs s nvestgated n the context of computerasssted coonoscopc dagnoss. A memory-based adaptaton of the earnng rate for the on-ne Bacpropagaton s proposed and used to seed an on-ne evouton process that appes a Dfferenta Evouton Strategy to (re-)adapt the neura networ to modfed envronmenta condtons. Our approach oos at on-ne tranng from the perspectve of tracng the changng ocaton of an approxmate souton of a pattern-based, and, thus, dynamcay changng, error functon. The proposed hybrd strategy s compared wth other standard tranng methods that have tradtonay been used for tranng neura networs off-ne. Resuts n nterpretng coonoscopy mages and frames of vdeo sequences are promsng and suggest that networs traned wth ths strategy detect magnant regons of nterest wth accuracy. KEYWORDS: Mnmay nvasve magng procedures, Bacpropagaton networs, Medca mage nterpretaton, On-ne earnng, Dfferenta evouton strateges, Artfca evouton.. INTRODUCTION In medca practce, endoscopc dagnoss and other mnmay nvasve magng procedures, such as computed tomography, utrasonography, confoca mcroscopy, computed radography, or magnetc resonance magng, are now permttng vsuazaton of prevousy naccessbe regons of the body. Ther obectve s to ncrease the expert s abty n dentfyng magnant regons and decrease the need for nterventon whe mantanng the abty for accurate dagnoss. Furthermore, t may be possbe to examne a arger area, studyng vng tssue n vvo - possby at a dstance [5] - and, thus, mnmse the shortcomngs of bopses, such as mted number of tssue sampes, deay n dagnoss, and dscomfort for the patent.

In ths paper, we focus on neura networ-asssted dagnoss of coonoscopy mages. Coonoscopy s a mnmay nvasve technque for the producton of mages of the coon: a narrow ppe e structure, an endoscope, s passed nto the patent s body. Vdeo endoscopes have sma cameras n ther tps, when passed nto a body, what the camera observes s dspayed on a teevson montor (see Fgure for frame sampes of a vdeo sequence). The physcan contros the endoscope s drecton usng whees and buttons and the whoe procedure s carred out under varabe perceptua condtons (shadngs, shadows, ghtng condton varatons, refectons etc.). Fgure. Frames of a vdeo sequence showng a poypod tumor of the coon. Neura networ-based methodooges present some nterestng quates, such as earnng from experence, generasaton, and are abe to hande uncertanty and ambguty n dstorted or nosy mages to some extent. Thus, these methods provde human experts wth sgnfcant assstance n medca dagnoss [8], [0], [], [3], [3]. The use of neura networs for the detecton of magnant regons n coonoscopy vdeo sequences encounters severa probems: the tme varyng nature of the process; changes n the perceptua drecton of the physcan; varatons n the dffused ght condtons. In most of these cases, off ne earnng or nowedge-based approaches are not abe to represent a possbe varatons of the envronment. On-ne tranng and retranng aow the networ to update ts weghts durng operaton by tang nto account both the aready stored nowedge and the nowedge extracted from the current data, and are proposed as aternatves to batch earnng-based approaches. Of course, the man chaenge when deang wth adaptve technques for earnng, such as on-ne tranng and retranng, s to baance the nformaton reated to recenty acqured data wth the nformaton aready emboded n the networ [3], [6], [7]. Thus, n ths paper we expore on-ne tranng and retranng of neura networs wth the am to detect magnant regons n coonoscopy mages though a formuaton of the probem that s based on the dea of tracng the movng optmum of a dynamcay changng pattern-based error measure. Ths approach concdes wth the way adaptaton on the evoutonary tme scae s consdered [9], and aows us to expore and expand further research on the tracng performance of evouton strateges and genetc agorthms [], [9], [35]. Hence, the reader shoud eep n mnd that n ths paper we do not see goba mnmsers of the error functon, but we are nterested n deveopng an on-ne evouton strategy that w converge to an approxmaton of the optmum souton (the nterestng topc of fndng goba mnmsers n neura networs tranng s descrbed esewhere [4]).

The paper s organsed as foows. Secton expans how textura varatons of the tssue are modeed n our approach. Secton 3 dscusses exstng earnng approaches, whe Secton 4 descrbes the proposed onne evouton strategy. In Secton 5 expermenta resuts are presented and fndngs are dscussed. Lasty, concusons are drawn n Secton 6.. TISSUE CLASSIFICATION FOR ENDOSCOPIC DIAGNOSIS In endoscopc dagnoss, the medca expert, based on a dstrbuted percept of oca changes, nterprets the physca surface propertes of the tssue - such as roughness or smoothness, reguarty, and shape - to detect abnormates. It s mportant to note, however, the vast dffcutes n physca attrbutes of the organs. For exampe, n coonoscopy, no two coons are ae. Even wthn the same coon, one secton may have very dfferent characterstcs from another. Adacent regons of the coon nng showng dfferent propertes are dstngushed on the bass of the textura varatons of ther tssue (pt patterns) []. These dffcutes ntroduce severe mtatons n the use of computer-asssted endoscopc dagnoss [3], [3]. Gven a medca mage, the true features assocated wth the physca surface propertes of the tssue are not exacty nown to the mage-nterpretaton system deveoper. Usuay, one or more feature-extracton modes [6], [7] are used to provde vaues for each feature s parameters. The fndngs are then used to nfer the correct nterpretaton. On ths same tas of nterpretaton on the bass of oca changes of the propertes of the tssue under examnaton, the performance of human percepton s consdered outstandng. Furthermore, medca experts have the abty to ether add or remove components from an mage to gve meanng to what they see. Medca experts can aso adapt to changes to the extent that even a dstorted mage can be recognzed. In computersed systems, the cassfcaton of mage regons s usuay qute sophstcated and nvoves mutpe eves of processng. In genera, a mode wth three stages s empoyed as shown n Fgure (adapted from [6]). Lower Leve Processng Enhancement, Feature Extracton, Segmentaton. Hgher Leve Processng Cassfcaton, Labeng, Outcome predcton. Dagnostcs Image Formaton Images Fgure. Mode for dagnostc system that uses medca mages. The ower-eve processng taes mage pxes as nput and performs varous tass such as mage enhancement, feature extracton and mage segmentaton. The hgher-eve processng taes the output from the ower-eve processng as nput and generates output reated to medca dagnostcs. Tass accompshed 3

n the hgher-eve processng ncude cassfcaton of features, detecton of specfc esons and dagnoss for varous abnormates. Orgna Image Image wndow extracted from the orgna mage Feature extracton technque Feature Vector... A A A 3 A N Fgure 3. Stages n feature extracton. An mportant stage of the mpementaton s the feature extracton process (see Fgure 3). In our experments the method of cooccurrence matrces was used for feature extracton. Cooccurrence matrces, [9], represent the spata dstrbuton and the dependence of the grey eves wthn a oca area. Each p(,) entry of the matrces, represents the probabty of gong from one pxe wth a grey eve () to another wth a grey eve () under a predefned dstance and ange. From these matrces severa sets of statstca measures, or feature vectors, are computed to bud dfferent texture modes. In our mpementaton, the coonoscopy mage was separated nto wndows of sze 6 6 pxes wth 8 pxes overap. Then the cooccurrence matrces agorthm was used to gather nformaton from the pxes of an mage wndow. Four anges, namey 0 ο, 45 ο, 90 ο, 35 ο, were consdered as we as a predefned dstance of one pxe n the formaton of the cooccurrence matrces. Therefore, four cooccurrence matrces usng the foowng four statstca measures were formed (see [] for detas): Energy-Anguar Second Moment: f = p( ) Correaton: Inverse Dfference Moment: Entropy: where N g f = f 3 =,, () Ng Ng ( ) p(, ) = = σxσ y + µ µ x y, ( ) p(, ) ( ( )) (), (3) f 4 = p, og p,, (4) s the number of grey eves, µ x, µ y are the margna mean vaues of x (aong the horzonta pxe axs) and y (aong the vertca pxe axs), respectvey, and σ x, σ y are the correspondng standard 4

devatons. Thus, a set of 6 features descrbng spata dstrbuton n each wndow s obtaned and used to formuate nputs for hgh eve processng. 3. BATCH LEARNING OF MULTILAYER PERCEPTRONS The most popuar neura networ mode s the so-caed Mut-Layer Perceptron (MLP). In an MLP, whose - th ayer contans N nodes, ( =,...,M), artfca neurons operate accordng to the foowng equatons: N = net = w y (5), y = f ( net ) (6) where net s, for the -th neuron n the -th ayer ( =,...,N ), the sum of ts weghted nputs. The weghts, for connectons from the -th neuron at the (-) ayer to the -th neuron at the -th ayer are denoted by w ; y s the output of the -th neuron that beongs to the -th ayer, and the ogstc functon f( net ) = ( + exp( net )) s the -th's neuron non-near actvaton functon. Tranng an MLP to recognse abnormates n mage regons s typcay reased by adoptng an error correcton strategy that adusts the networ weghts through mnmsaton of earnng error: where ( y M t, p, p ) P p N M M E = = = ( y p t,, p ) P = p = s the squared dfference between the actua output vaue at the -th output ayer neuron, for an nput sampe p, and the target output vaue; p s an ndex over nput-output patterns. E p (7) A varety of approaches adapted from the theory of unconstraned optmsaton have been apped to the mnmsaton of functon E. For exampe, et us consder the cass of batch earnng agorthms that adust the weghts accordng to the teratve scheme: w + = w + ηϕ = 0,,,K (8) Note that n (8) the weghts of the MLP are expressed n a smpfed form usng vector notaton. Thus, w defnes the current weght vector, ϕ s a correcton term, and η s the earnng rate at the th teraton. Varous choces of the correcton term ϕ gve rse to dstnct batch earnng agorthms, whch are usuay cassfed as frst-order or second-order agorthms dependng on the dervatve-reated nformaton they use to generate the correcton term. Thus, frst-order agorthms are based on the frst dervatve of the earnng error wth respect to the weghts, whe second-order agorthms on the second dervatve (see [4] for a revew on frst-order and second-order tranng agorthms). 5

A broad cass of batch-type frst-order agorthms, whch are consdered much smper to mpement than second-order methods, uses the correcton term ϕ = E( w ). The term E( w ) defnes the gradent vector of the MLP and s obtaned by means of bac-propagaton of the error through the ayers of the networ. The most popuar agorthm of ths cass, caed batch Bac-Propagaton (BP) appes the steepest descent method wth a constant, heurstcay chosen, earnng rate η that usuay taes vaues n the nterva (0,) [7]. Vaues n ths nterva are consdered sma enough to ensure the convergence of the BP tranng agorthm and consequenty the success of earnng [4]. However, t s we nown that ths practce tends to be neffcent [4], [7] and the use of adaptve earnng rate strateges s suggested n order to acceerate the earnng process (see [4] and [9] for revews on adaptve earnng rate agorthms). Wth regards to second-order tranng agorthms, nonnear conugate gradent methods, such as the Fetcher-Reeves or the Poa-Rbere methods [], or varabe metrc methods, such as the Broyden- Fetcher-Godfarb-Shanno method [4], or even modfcaton of Newton's method [7], [8] have been proposed n the terature. These methods expot dervatve cacuatons and submnmzaton procedures (e.g. the nonnear conugate gradent methods) and/or approxmatons of varous matrces (e.g. the Hessan matrx for the varabe metrc or quas-newton methods) to acceerate the earnng process. 4. ON-LINE EVOLUTION STRATEGY On-ne tranng n neura networs s reated to updatng the networ parameters after the presentaton of each tranng exampe, whch may be samped wth or wthout repetton. On-ne tranng may be the approprate choce for earnng a tas ether because of the very arge (or even redundant) tranng set, or because of the sowy tme-varyng nature of the tas. Athough batch tranng seems faster for sma-sze tranng sets and networs, on-ne tranng s probaby more effcent for arge tranng sets and networs. It heps to escape oca mnma and provdes a more natura approach to earnng n non-statonary envronments. On-ne methods seem to be more robust than batch methods as errors, omssons or redundant data n the tranng set can be corrected or eected durng the tranng phase. Addtonay, tranng data can often be generated easy and n great quanttes when the system s n operaton, whereas they are usuay scarce and precous before. Lasty, on-ne tranng s necessary n order to earn and trac tme varyng functons and contnuousy (re-)adapt n a changng envronment. Despte the abundance of methods for earnng from exampes, there are ony few that can be used effectvey for on-ne earnng. For exampe, the cassc batch tranng agorthms cannot straghtforwardy hande nonstatonary data. Even when some of them are used n on-ne tranng there exsts the probem of catastrophc nterference, n whch tranng on new exampes nterferes excessvey wth prevousy earned exampes eadng to saturaton and sow convergence [3], [34]. Beow we present an on-ne BP- 6

seeded Dfferenta Evouton (DE) strategy for on-ne neura networ tranng. Frsty, we brefy present the on-ne BP earnng stage of the proposed strategy. Then we proceed by descrbng the on-ne DE stage. Note that the descrpton beow focuses on the probem of adaptng the weghts on-ne, assumng that onne evouton s aways actvated, and does not requre the nput and desred output data to be nown a pror. Our experments, reported n the next secton, were aso conducted under the same assumptons to test the robustness of our approach. Note, however, that n practce, whenever the changes of the envronment are not consdered sgnfcant and the performance s satsfactory, the weghts and structure of the networ shoud reman constant [6]. 4. ON-LINE BACKPROPAGATION LEARNING On-ne BP earnng strateges are usuay based on the use of stochastc gradent descent due to the nherent effcency of ths method n tme-varyng envronments [], [30], [3], [33], [34]. On-ne earnng has been anaysed wthn the framewor of statstcs and t has been shown that t s asymptotcay as effectve as batch (aso caed off-ne) earnng. However, senstvty to earnng parameters s a common drawbac of these schemes [8]. Advanced optmsaton methods, such as conugate gradent, varabe metrc, smuated anneang etc., cannot be used n ths context, as they rey on a fxed error surface and need nformaton from the whoe tranng set [8]. In [0], a varant of the on-ne BP has been proposed. The method can be consdered as a meta-earnng agorthm n the sense that t earns the earnng rate parameters of an underyng base earnng system (.e. of the stochastc gradent descent). To ths end, the new varant uses a memory-based earnng rate adaptaton schedue that expots gradent reated nformaton from the current as we as the two prevous pattern presentatons: + η = η + γ E ( w ), E ( w ) + γ E ( w ), E ( w ). (9) p p At the start of the earnng procedure, = 0, the earnng rate s set to a sma postve vaue; e.g. the nta earnng rate was set to 0.00 n our experments. Then, the weghts are updated on-ne, for each pattern p, foowng the teratve scheme: w = w p p η E ( w ). (0) + p In (9),.,. stands for the usua nner product n n R, E s the pattern-based error measure and s the p E p correspondng gradent vector; η s the earnng rate, and, γ are the meta-earnng rates ( γ < γ ). γ < Meta-earnng rates are aso n use by other on-ne earnng schemes, such as n [], [30],[3],[33], and can tae varous forms dependng on the method. Prevous experments wth the new varant have shown that the scheme of Eq. (9) seems to provde addtona stabzaton n the cacuated vaues of the earnng rate, and 7

heps the stochastc gradent descent to exhbt fast convergence and hgh success rate [0]. In addton, the method s charactersed by ow storage requrements and nexpensve computatons, as t ony uses aready cacuated nformaton from the current, as we as the prevous teraton. The dea of consderng the gradent of the prevous teratons n a earnng rate adaptaton scheme has aso been proposed n the context of off-ne earnng. Partcuary, Jacobs n the deta-bar-deta agorthm, [], measures the runnng average of the current, E( w ), and past parta dervatves n order to chec whether the current gradent has the same sgn as the average gradent. Then the agorthm ether ncreases the earnng rate by addng a postve constant to the current vaue, or decreases t by mutpyng the current vaue wth a postve, smaer than one, constant. Fnay, the weghts are updated usng a varant of Eq. (8), as the deta-bar-deta agorthm needs nformaton from the whoe tranng set (.e. t performs batch earnng). The on-ne teratve scheme of (9)-(0) was shown to provde ncreased speed and hgher possbty of good performance n dfferent casses of probems when compared aganst the cassc on-ne BP and other metaearnng rate agorthms (see [0], [5] for detas and comparsons). The roe of on-ne BP n the context of computer-asssted coonoscopc dagnoss s to ntase the popuaton of the DE strategy wth an nta approxmaton of the souton, as w be descrbed beow. 4. DIFFERENTIAL EVOLUTION STRATEGY Evouton Strateges (ESs) are adaptve stochastc search methods that mmc the metaphor of natura boogca evouton. The man dfferences between ESs and Genetc Agorthms e n that the sefadaptaton of the mutaton operator s a ey feature of the ESs, and n that GAs prefer smaer mutaton probabty (rate) [], [9]. Here we use the Dfferenta Evouton strateges, whch have been desgned as stochastc parae drect search methods that can hande non-dfferentabe, non-near and mutmoda obectve functons effcenty, and requre few easy chosen contro parameters [3]. Expermenta resuts have shown that DE strateges have good convergence propertes and outperform other evoutonary agorthms and anneang methods [3]. To appy DE strateges to neura networ tranng we start wth a specfc number (NP) of n-dmensona weght vectors, as nta popuaton, and evove them over tme; NP s fxed throughout the tranng process and the weght popuaton s ntased by perturbng the approxmate souton provded by the on-ne BP (see Reatons (9)-(0)). Thus, the on-ne BP seeds the DE, so the nta popuaton mght be generated by addng normay dstrbuted random devatons to the nomna souton. Let us now descrbe the proposed verson of DE strategy that s used n the on-ne evouton strategy. The weght vectors evove randomy wth each pattern presentaton (teraton) through the reaton v + ( w w + w w ), =, KNP = w + µ, () best r r 8

where w s the best popuaton member of the prevous teraton, µ > 0 s a rea parameter (mutaton best constant) whch reguates the contrbuton of the dfference between weght vectors, and vectors randomy chosen from the popuaton wth {,, K,, + NP} w, r w r are weght r, r, K,,.e. r, r are random ntegers mutuay dfferent from the runnng ndex. Amng at ncreasng the dversty of the weght vectors further, a crossover-type operaton s ntroduced n Reaton (). Thus, the so-caed tra vector, u +, =,K NP, s generated. u +, + v, = w, f f r ρ r > ρ or and = rand. () rand Ths operaton wors as foows: the mutant weght vectors, v +, =, KNP, are mxed wth the target vectors, w. Specfcay, we randomy choose a rea number r n the nterva [0,] for each component, =,,, n, of + v. Ths number s compared wth ρ [0, ] (crossover constant), and f r ρ then the -th + + component of the tra vector u gets the vaue of the -th component of the mutant vector, ; otherwse, t gets the vaue of the -th component of the target vector, w. In (), rand s a randomy seected ndex that s used to ensure the tra vector has at east one component from the mutant vector. An appcaton exampe of ths operaton s shown n Fgure 4 for a seven-dmensona weght vector. v 3 3 4 4 5 5 6 6 7 7 w v ρ target vector, mutant vector, tra vector, Fgure 4. Iustraton of the crossover operaton. r 3 4 5 6 7 u The tra vector s accepted for the next teraton f and ony f t reduces the vaue of the pattern-based error measure ; otherwse the od vaue,, s retaned. Ths ast operaton, caed seecton, ensures that the E p w ftness starts steady decreasng at some teraton, and s descrbed n Reaton (3). + + u f E ( u ) < E ( w ) + p p w = + (3) w f E ( u ) E ( w ) p p 9

The combned acton of mutaton and crossover operaton s responsbe for much of the effectveness of DE search, and aows DE strateges to act as parae, nose-toerant h-cmbng agorthms, whch effcenty search the whoe space for soutons [3]. 5. EXPERIMENTS AND RESULTS In our experments, the coonoscopy mages and vdeo frames were separated nto wndows of sze 6 6 pxes wth overap of 8 pxes. Then the co-occurrence matrces agorthm was apped to gather nformaton regardng pxe neghbourhoods of randomy seected mage wndows, as descrbed n Secton. The procedure resuts n 6-dmensona feature vectors, whch are very nosy as no pre-fterng or segmentaton technques s apped, and are used n the experments descrbed beow. The earnng parameters of the onne evouton strategy have been set foowng the recommendatons of [5] and [3]: γ = 0.05, γ = 0. 95, µ = 0.5, ρ = 0.9. Lasty, NP=00. (a) (b) Fgure 5. Coonoscopy mages used n the experments. In the frst set of experments, 000 MLPs wth varyng number of hdden nodes (from 8 to ) were traned usng two batch earnng agorthms, the Adaptve earnng rate Bacpropagaton (ABP) proposed by Vog [36], and the Levenberg-Marquardt (LM) method [7], as typca exampes of frst- and second- order tranng agorthms respectvey. The MLPs were traned usng 0 norma/0 abnorma sampes from mage wndows that were randomy extracted from mages (a) and (b) (see Fgure 5) and tested wth dfferent tssue sampes taen from the two mages. Note that the magnant regons n these mages beong to two dfferent types: Image (a) s a ow grade cancer, whe Image (b) s a moderatey dfferentated carcnoma [5]. The performance of the traned MLPs has been tested on a set of 80 texture sampes (40 norma and 40 magnant) randomy seected from the two mages and dfferent from the tranng set. Ony a sma sampe out of 000 traned MLPs of the dfferent archtectures exhbted cassfcaton success of 90% or hgher. Detaed resuts are shown n Fgure 6. More specfcay, ony 50 MLPs wth 8 hdden nodes, out of the 000 traned, exhbted the desred cassfcaton success (see Fgure 6, eft part). Note aso the sgnfcant dfference n the number of the MLPs wth acceptabe cassfcaton success among the ABP and the LM traned ones. The LM agorthm aso reveas a hgher average percentage of cassfcaton success, as shown 0

on the rght part of Fgure 6. In fact MLPs wth hdden nodes exhbt the hghest average n cassfcaton success (96.75%). Thus, hdden node MLPs were used n the second set of experments. 60 50 LM traned MLPs ABP traned MLPs 97.00 96.50 96.00 ABP LM 40 95.50 Number of MLPs out of 000 30 0 0 00 90 80 70 60 50 40 30 0 8 9 0 3 4 5 3 6 4 7 5 8 6 9 7 0 8 9 0 3 4 Number of hdden nodes Average cassfcaton success 95.00 94.50 94.00 93.50 93.00 9.50 9.00 9.50 9.00 90.50 90.00 8 9 30 4 5 63 4 7 5 8 6 9 0 7 8 9 3 0 4 Number of hdden nodes Fgure 6. Number of MLPs (out of 000) wth cassfcaton success greater than 89% (eft), and average cassfcaton success (rght) for these MLPs. Resuts are for the mages of Fgure 5. In the second set of experments, 000 MLPs of 6-- archtecture were traned off-ne to detect magnant regons n a frame of coonoscopy vdeo sequence usng a tranng set of 50 norma/50 abnorma patterns. Three batch-earnng methods were comparatvey evauated n ths round of experments: the Levenberg-Marquardt that exhbted good performance n the prevous round, the Scaed Conugate Gradent (SCG) method, [], that s consdered accordng to the terature as a good aternatve to the use of second-order methods [7], and the Rprop agorthm, [5], whch s a frst-order method that appes heurstcs to adapt a dfferent earnng rate for each weght of the networ and combnes successfuy effectveness wth ow computatona requrements [7]. The percentage of cassfcaton success n testng (test set ncuded 3969 patterns,.e. the whoe regon covered n the vdeo frame) for the 000 traned networs s shown n Fgure 7. One can observe that t s not easy to ocate weghts that w aow the networs to detect magnant regons wth a success of over 90%. For exampe, n Fgure 7, ony networs out of the 000 traned wth the Rprop agorthm acheved recognton success from 90% to 00%. For the SCG method the correspondng number s 3 out of 000, whe for the LM method ths number s sghty hgher, as 6 out of the 000 networs exhbted cassfcaton success between 90% and 00%. The best resut for each tranng method s: 9% for the Rprop, 9.4% for the LM and 9.6% for the SCG. Rprop needs on average more epochs to converge than the SCG and LM methods but does not requre heavy matrx computatons or submnmsatons. As a consequence, t was observed that the average tme for tranng wth Rprop was shorter than the correspondng tme of SCG or LM. Thus, we decded to eep Rprop and run experments wth data from other vdeo frames of the same vdeo sequence. The best resuts are summarsed n Tabe.

700 Number of traned networs 600 500 400 300 00 00 Rprop L-M SCG 0 50-59 60-69 70-79 80-89 90-00 Percentage of cassfcaton succes n testng Fgure7. Generasaton resuts for three batch-tranng agorthms. From the resuts of Tabe, t s cear that Rprop exhbts the best overa performance compared wth ABP. Note that the resuts of Tabes have been acheved by tranng off-ne frame-dedcated MLPs wth hdden nodes usng 300 patterns randomy chosen from each frame and testng usng data of the same frame. Method Frame Frame Frame 3 Frame 4 Rprop 9% 9% 9% 93% ABP 8% 85% 83% 8% Tabe. Best cassfcaton success for two frst order batch-earnng methods. In the thrd set of experments, the Rprop agorthm was compared wth the cassc on-ne BP usng data from another frame of the same vdeo sequence. 300 patterns were used for tranng and 3969 for testng (.e. the whoe tssue regon covered n the frame). The capabty of the traned networ (6-- MLPs were used) wth the best performance n assgnng approprate charactersatons (norma/abnorma) to frame regons s shown n Tabe. Method Abnorma (%) Norma (%) Mean (%) Rprop 83 96 93 On-ne BP 73 93 88 Tabe. Best performance n terms of generasaton for Rprop and on-ne BP. The Rprop reveas, n genera, a hgher percentage of success than the on-ne BP. The reader shoud, of course, eep n mnd that Rprop mnmses a batch error measure,.e. t uses the true gradent of the error functon as t expots nformaton from a the tranng patterns. The on-ne BP, on the other hand, mnmses a pattern-based error measure and wors wth an nstantaneous approxmaton of the true gradent because nformaton from ony one pattern s used at each teraton. Therefore, on-ne BP can be used for

(re-)adaptng to modfed envronment condtons, whe Rprop requres a nformaton about nput-output patterns to be nown a pror and, thus, fas to wor when a the reevant features of the envronment are not expcty defned n advance. However, the resuts of the experments made cear that the cassc on-ne BP needs further mprovement n order to tran networs to detect magnant regons wth accuracy comparabe to batch tranng methods. In the fourth set of experments, 6-- MLPs have been traned on-ne to detect magnant regons n a set of four frames from the same vdeo sequence. The frames used n the two prevous experments were ncuded n the set. The networs have been traned on-ne, foowng the teratve scheme (9) for adaptng the earnng rate, to recognse patterns from the frst frame. Then on-ne earnng wth dfferenta evouton occurred as data from the second frame appeared at the nput. The on-ne evouton earnng strategy contnuousy adapts the networ as patterns from other frames are presented n random order at the nput. In tota, 00 patterns from the four frames of the vdeo sequence were presented to the networ durng the tranng phase. The networ was then tested usng 5876 patterns from the four frames (4000 patterns approxmatey cover the whoe tssue regon of a frame and ncude norma as we as magnant areas). The average capabty of the traned networs n assgnng approprate charactersatons to expored coon nng regons s presented n Tabe 3. Method Frame Frame Frame 3 Frame 4 On-ne BP 83% 84% 77% 88% On-ne BP seeded DE 93% 9% 84% 90% Tabe 3. Norma/abnorma detecton accuracy. The on-ne BP seeded DE scheme provdes generasaton resuts cose to the best resuts obtaned by the batch tranng methods, as reported n the prevous experments. For exampe, the best SCG-traned dedcated networ n the second experment (traned off-ne and tested usng data from Frame ) had 9.6% success, and the best Rprop-traned dedcated networ n the thrd experment (traned off-ne and tested usng data from Frame ) had 93% success. 6. CONCLUSIONS AND FUTURE WORK In ths paper a new scheme for neura networ-based coonoscopc dagnoss was ntroduced. The proposed on-ne evouton strategy can be consdered as a hybrd agorthm. It uses an on-ne Bacpropagaton strategy wth adaptve earnng rate to seed the nta popuaton of the on-ne Dfferenta Evouton strategy. In our experments, neura networs traned wth the proposed on-ne evouton strategy exhbted satsfactory performance under changng envronmenta condtons, as data from dfferent frames were presented to the networ. 3

In the reported experments no emphass was put n fne-tunng the heurstc parameters of our scheme; cassc vaues found n the reevant terature of Dfferenta Evouton strategy were used nstead. In future wor we w fuy nvestgate the propertes, study the effect of the heurstc parameters and evauate the fu potenta of the hybrd earnng strategy n coonoscopc dagnoss by means of extensve testng on ong vdeo sequences and nterpretaton of compex tssue regons. ACKNOWLEDGEMENTS The authors gratefuy acnowedge the contrbuton of Dr. S. Karans (Technoogca Educatona Insttute of Lama, Greece) and D. Iaovds (Unversty of Athens, Greece) n the acquston of vdeo sequences and extracton of features. REFERENCES [] Ameda, L.B., Langos, T., Amara, J.D., and Panhov, A. (998). Parameter adaptaton n stochastc optmsaton, n On-ne Learnng n Neura Networs, Saad, D. (ed.), -34, Cambrdge Unversty Press. [] Angene, P. (997). Tracng extrema n dynamc envronments ', 6th Annua Conference on Evoutonary Programmng VI, 335-345, Sprnger. [3] Anguta D. (00). Smart adaptve systems: state of the art and future drectons of research, n Proceedngs of the European Symposum on Integent Technooges, Hybrd Systems and ther Impementaton on Smart Adaptve Systems-EUNITE 00, Tenerfe, Span, -4. [4] Battt, R. (99). Frst- and second-order methods for earnng: between steepest descent and Newton's method, Neura Computaton, 4, 4-66. [5] Deaney, P.M., Papworth, G.D., and Kng, R.G. (998). Fbre optc confoca magng (FOCI) for n vvo subsurface mcroscopy of the coon, n Methods n dsease: Investgatng the gastrontestna tract, Preedy V.R., Watson R.R. (eds.), Greenwch Medca Meda, London, UK. [6] Douams, A.D., Douams, N.D., and Koas, S.D. (000). On-ne retranabe neura networs: mprovng the performance of neura networs n mage anayss probems, IEEE Transactons on Neura Networs,, 37-55. [7] Hagan, M., Menha M. (994). Tranng feedforward networs wth the Marquardt agorthm, IEEE Transcactons on Neura Ntewors, 5, 989-993. [8] Hana, R., Harte, T.P., Dxon, A.K., Lomas, D.J., and Brtton, P.D. (996). Neura networs n the nterpretaton of contrast-enhanced magnetc resonance mages of the breast, Heathcare Computng 996, Harrogate, UK, 75 83. [9] Harac, R.M. (979). Statstca and structura approaches to texture, IEEE Proc., 67, 786-804. [0] Innocent, P.R., Barnes, M., and John, R. (997). Appcaton of the fuzzy ART/MAP and MnMax/MAP neura networ modes to radographc mage cassfcaton, Artfca Integence n Medcne,, 4 63. [] Jacobs R. (988). Increased rates of convergence through earnng rate adaptaton, Neura Networs,, (4), 95-307. 4

[] Karans, S., Magouas, G.D., and Theofanous N. (000). Image recognton and neurona networs: Integent systems for the mprovement of magng nformaton, Mnmay Invasve Therapy and Aed Technooges, 9, 5 30. [3] Karans S.A., Magouas G.D., Iaovds D.K., Karras D.A. and D.E.Marous, Evauaton of Textura Feature Extracton Schemes for Neura Networ-based Interpretaton of Regons n Medca Images, Proceedngs of IEEE Internatona Conference on Image Processng, Thessaon, Greece, October 7-0, 00. [4] Kwoh, C.K. (995). Probabstc reasonng from correated obectve data, Ph.D. Thess, Impera Coege, London, UK. [5] Kudo, S.E., Kashda, H., Tamura, T., Kogure, E., Ima, Y., Yamano, H., and Hart, A.R. (000). Coonoscopc dagnoss and management of nonpoypod eary coorecta cancer, Word Journa of Surgery, vo. 4, no. 9, pp.08-090, 000. [6] Leondes, C.T. (998). Image Processng and Pattern Recognton, Neura Networ Systems Technques and Appcatons Seres, 5, Academc Press. [7] Looney, C.G. (997). Pattern recognton usng neura networs, Oxford Unversty Press, Oxford, UK. [8] Magouas G. D., Vrahats M. N., Grapsa T. N., and Androuas G. S. (997). Neura networ supervsed tranng based on a dmenson reducng method, n S. W. Eacot, J. C. Mason and I. J.Anderson, eds., Mathematcs of Neura Networs: Modes, Agorthms and Appcatons, pp. 45--49, Kuwer. [9] Magouas, G.D., Vrahats, M.N., and Androuas, G.S. (999). Improvng the convergence of the bacpropagaton agorthm usng earnng rate adaptaton methods, Neura Computaton,, 769-796. [0] Magouas, G.D., Paganaos, V.P., and Vrahats, M.N. (00). Adaptve stepsze agorthms for onne tranng of neura networs, Nonnear Anayss: Theory, Methods and Appcatons, 47, 345-3430. [] Möer, M. (993). A scaed conugate gradent agorthm for fast supervsed earnng, Neura Networs, 6, 55-533. [] Nagata, S., Tanaa, S., Haruma, K., Yoshhara, M., Sum, K., Kayama, G., and Shmamoto, F. (000). Pt pattern dagnoss of eary coorecta carcnoma by magnfyng coonoscopy: cnca and hstoogca mpcatons, Internatona Journa of Oncoogy, 6, 97-934. [3] Phee, S.J., Ng, W.S., Chen, I.M., Seow-Choen, F., and Daves, B.L. (998). Automaton of coonoscopy part II: Vsua-contro aspects, IEEE Engneerng n Medcne and Boogy, 8 88. [4] Paganaos, V.P., Magouas, G.D., and Vrahats, M.N. (00). Supervsed tranng usng goba search methods, n Advances n convex anayss and goba optmsaton, Hadsavvas, N.; Pardaos, P. (eds.), vo. 54, Noncovex Optmzaton and ts Appcatons, Kuwer Academc Pubshers, Dordrecht, The Netherands, pp.4-43. [5] Paganaos V.P., Magouas G.D. and Vrahats M.N. (00). Learnng rate adaptaton n stochastc gradent descent, n Advances n Convex Anayss and Goba Optmzaton, Hadsavvas N.; Pardaos P. (eds.), vo. 54, Noncovex Optmzaton and ts Appcatons, Kuwer Academc Pubshers, Dordrecht, The Netherands, pp.433-444. [6] Redmer, M. and Braun, H. (993). A drect adaptve method for faster bac-propagaton earnng: the Rprop agorthm ', n Proceedngs of IEEE Internatona Conference on Neura Networs, San Francsco, 586-59. [7] Roeh, N.M. and Pedrera, C.E. (00). An onne earnng approach: a methodoogy for tme varyng appcatons, Neura Computng and Appcatons, 0, 0-07. [8] Saad, D. (998). On-ne earnng n neura networs, Cambrdge Unversty Press. 5

[9] Saomon, R. and Eggenberger, P. (998). Adaptaton on the evoutonary tme scae: a worng hypothess and basc experments, n Proceedngs of the 3 rd European Conference on Artfca Evouton (AE'97), Nmes, France, Lecture Notes n Computer Scence vo. 363, Sprnger. [30] Schraudoph, N.N. (998). Onne oca gan adaptaton for mut-ayer perceptrons, Technca Report, IDSIA-09-98, IDSIA, Lugano, Swtzerand. [3] Schraudoph, N.N. (999). Loca gan adaptaton n stochastc gradent descend, Technca Report, IDSIA-09-99, IDSIA, Lugano, Swtzerand. [3] Storn, R. and Prce, K. (997). Dfferenta evouton: a smpe and effcent heurstc for goba optmzaton over contnuous spaces, Journa of Goba Optmzaton,, 34-359. [33] Sutton, R.S. (99). Adaptng bas by gradent descent: an ncrementa verson of deta-bar-deta, n Proceedngs of the 0 th Natona Conference on Artfca Integence, MIT Press, 7-76. [34] Sutton, R.S. and Whtehead, S.D. (993). Onne earnng wth random representatons, n Proceedngs of the 0 th Internatona Conference on Machne Learnng, Morgan Kaufmann, 34-3. [35] Vava, F. and Fogarty, T.C. (996). A comparatve study of steady state and generatona genetc agorthms, Evoutonary Computng: AISB Worshop, Lecture Notes n Computer Scence vo. 43, Sprnger. [36] Vog, T. P., Mangs, J. K., Rger, A. K., Zn, W. T., and Aon, D. L. (988). Acceeratng the convergence of the bac-propagaton method, Boogca Cybernetcs, 59, 57-63. 6