Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification



Similar documents
Modified Line Search Method for Global Optimization

LECTURE 13: Cross-validation

Review: Classification Outline

Systems Design Project: Indoor Location of Wireless Devices

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

Incremental calculation of weighted mean and variance

1 Correlation and Regression Analysis

CHAPTER 3 THE TIME VALUE OF MONEY

Output Analysis (2, Chapters 10 &11 Law)

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

I. Chi-squared Distributions

CHAPTER 3 DIGITAL CODING OF SIGNALS

Soving Recurrence Relations

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Confidence Intervals for One Mean

Generalization Dynamics in LMS Trained Linear Networks

Department of Computer Science, University of Otago

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

5 Boolean Decision Trees (February 11)

Cantilever Beam Experiment

INVESTMENT PERFORMANCE COUNCIL (IPC)

(VCP-310)

Chapter 7 Methods of Finding Estimators

Sequences and Series

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Research Article Sign Data Derivative Recovery

Spam Detection. A Bayesian approach to filtering spam

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Theorems About Power Series

Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Lesson 17 Pearson s Correlation Coefficient

PSYCHOLOGICAL STATISTICS

Now here is the important step

Plug-in martingales for testing exchangeability on-line

Subject CT5 Contingencies Core Technical Syllabus

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Totally Corrective Boosting Algorithms that Maximize the Margin

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

5: Introduction to Estimation

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Hypergeometric Distributions

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

1 Computing the Standard Deviation of Sample Means

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Cutting-Plane Training of Structural SVMs

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

Notes on exponential generating functions and structures.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Reliability Analysis in HPC clusters

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Asymptotic Growth of Functions

Domain 1: Designing a SQL Server Instance and a Database Solution

HCL Dynamic Spiking Protocol

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand

JJMIE Jordan Journal of Mechanical and Industrial Engineering

Overview on S-Box Design Principles

Convention Paper 6764

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Regularized Distance Metric Learning: Theory and Algorithm

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

1. Introduction. Scheduling Theory

NATIONAL SENIOR CERTIFICATE GRADE 12

Hypothesis testing. Null and alternative hypotheses

Lesson 15 ANOVA (analysis of variance)

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina

Domain 1 - Describe Cisco VoIP Implementations

Effective Hybrid Intrusion Detection System: A Layered Approach

Determining the sample size

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Problem Solving with Mathematical Software Packages 1

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

A gentle introduction to Expectation Maximization

Iran. J. Chem. Chem. Eng. Vol. 26, No.1, Sensitivity Analysis of Water Flooding Optimization by Dynamic Optimization

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Modeling of Ship Propulsion Performance

Ranking Irregularities When Evaluating Alternatives by Using Some ELECTRE Methods

Lecture 2: Karger s Min Cut Algorithm

A probabilistic proof of a binomial identity

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

AP Calculus AB 2006 Scoring Guidelines Form B

Transcription:

1882 J. Chem. If. Comput. Sci. 2003, 43, 1882-1889 Compariso of Support Vector Machie ad Artificial Neural Network Systems for Drug/Nodrug Classificatio Evgey Byvatov, Uli Fecher, Jes Sadowski, ad Gisbert Scheider*, Istitut für Orgaische Chemie ud Chemische Biologie, Joha Wolfgag Goethe-Uiversität, Marie-Curie-Strasse 11, D-60439 Frakfurt, Germay, ad AstraZeeca R&D Möldal, SC 264, S-431 83 Möldal, Swede Received Jue 13, 2003 Support vector machie (SVM) ad artificial eural etwork (ANN) systems were applied to a drug/odrug classificatio problem as a example of biary decisio problems i early-phase virtual compoud filterig ad screeig. The results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to ANN traiig. Geerally, the SVM classifier yielded slightly higher predictio accuracy tha ANN, irrespective of the type of descriptors used for molecule ecodig, the size of the traiig data sets, ad the algorithm employed for eural etwork traiig. The performace was compared usig various differet descriptor sets ad descriptor combiatios based o the 120 stadard Ghose-Crippe fragmet descriptors, a wide rage of 180 differet properties ad physicochemical descriptors from the Molecular Operatig Eviromet (MOE) package, ad 225 topological pharmacophore (CATS) descriptors. For the complete set of 525 descriptors cross-validated classificatio by SVM yielded 82% correct predictios (Matthews cc ) 0.63), whereas ANN reached 80% correct predictios (Matthews cc ) 0.58). Although SVM outperformed the ANN classifiers with regard to overall predictio accuracy, both methods were show to complemet each other, as the sets of true positives, false positives (overpredictio), true egatives, ad false egatives (uderpredictio) produced by the two classifiers were ot idetical. The theory of SVM ad ANN traiig is briefly reviewed. INTRODUCTION Early-phase virtual screeig ad compoud library desig ofte employs filterig routies which are based o biary classifiers ad are meat to elimiate potetially uwated molecules from a compoud library. 1,2 Curretly two classifier systems are most ofte used i these applicatios: PLSbased classifiers 3,4 ad various types of artificial eural etworks (ANN). 5-9 Typically, these systems yield a average overall accuracy of 80% correct predictios for biary decisio tasks followig the likeess cocept i virtual screeig. 2,10 The support vector machie (SVM) approach was first itroduced by Vapik as a potetial alterative to covetioal artificial eural etworks. 11,12 Its popularity has grow ever sice i various areas of research, ad first applicatios i molecular iformatics ad pharmaceutical research have bee described. 13-15 Although SVM ca be applied to multiclass separatio problems, its origial implemetatio solves biary class/oclass separatio problems. Here we describe applicatio of SVM to the drug/ odrug classificatio problem, which employs a class/ oclass implemetatio of SVM. Both SVM ad ANN algorithms ca be formulated i terms of learig machies. The stadard sceario for classifier developmet cosists of two stages: traiig ad testig. Durig first stage the learig machie is preseted with labeled samples, which are basically -dimesioal vectors with a class membership * Correspodig author phoe: +49-69 79829821; fax: +49-69 7982-9826; e-mail: gisbert.scheider@modlab.de. Joha Wolfgag Goethe-Uiversität. AstraZeeca R&D Möldal. label attached. The learig machie geerates a classifier for predictio of the class label of the iput coordiates. Durig the secod stage, the geeralizatio ability of the model is tested. Curretly various sets of molecular descriptors are available. 16 For applicatio to drug/odrug classificatio of compouds, the molecules are typically represeted by -dimesioal vectors. 6,7 I this work, we focused o the fragmet-based Ghose-Crippe (GC) descriptors 17-19 which were used i the origial work of Sadowski ad Kubiyi for drug/odrug classificatio, 7 descriptors provided by the MOE software package (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad CATS topological pharmacophores. 20 Havig defied this molecular represetatio, the task of the preset study was to compare the classificatio ability of stadard SVM ad feed-forward ANN o the drug/odrug data. A wwwbased iterface for calculatig the drug-likeess score of a molecule usig our SVM solutio based o the CATS descriptor was developed ad ca be foud at URL: http:// gecco.org.chemie.ui-frakfurt.de/gecco.html. DATA AND METHODS Data Sets. For SVM ad ANN traiig we used the sets of drug ad odrug molecules prepared by Kubiyi ad Sadowski. 7 From the origial data set 9208 molecules could be processed by our descriptor geeratio software. The fial workig set cotaied 4998 drugs ad 4210 odrug molecules. Three sets of descriptors were calculated: couts of the stadard 120 Ghose Crippe descriptors, 17-19 180 10.1021/ci0341161 CCC: $25.00 2003 America Chemical Society Published o Web 09/27/2003

ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 1883 descriptors from MOE (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad 225 topological pharmacophore (CATS) descriptors. 20 MOE descriptors iclude various 2D ad 3D descriptors such as volume ad shape desciptors, atom ad bods couts, Kier- Hall coectivity ad kappa shape idices, adjacecy ad distace matrix descriptors, pharmacophore feature descriptors, partial charges, potetial eergy descriptors, ad coformatio-depedet charge descriptors. Before calculatig MOE descriptors, sigle 3D coformers were geerated by CORINA. 21 225 CATS descriptors were calculated usig our ow software takig ito cosideratio pairs of atom types separated by up to 15 bods (URL: http:// gecco.org.chemie.ui-frakfurt.de/gecco.html). 20 All 225 descriptor colums were idividually autoscaled. A alterative would have bee block-scalig where each descriptor class is autoscaled as a whole, which was ot applied here. Support Vector Machie. SVM classifiers are geerated by a two-step procedure: First, the sample data vectors are mapped ( projected ) to a very high-dimesioal space. The dimesio of this space is sigificatly larger tha dimesio of the origial data space. The, the algorithm fids a hyperplae i this space with the largest margi separatig classes of data. It was show that classificatio accuracy usually depeds oly weakly o the specific projectio, provided that the target space is sufficietly high dimesioal. 11 Sometimes it is ot possible to fid the separatig hyperplae eve i a very high-dimesioal space. I this case a tradeoff is itroduced betwee the size of the separatig margi ad pealties for every vector which is withi the margi. 11 The basic theory of SVM will be briefly reviewed i the followig. The separatig hyperplae is defied as D(x) ) (w x) + w 0 Here x is a samples vector mapped to a high dimesioal space, ad w ad w 0 are parameters of the hyperplae that SVM will estimate. The the margi ca be expressed as a miimal τ for which holds Without loss of geerality we ca apply a costrait τ w ) 1tow. I this case maximizig τ is equivalet to miimizig w ad SVM traiig is becomig the problem of fidig the miimum of a fuctio with the followig costraits: miimize y k D(x k ) g τ w η(w) ) 1 2 (w w) subject to costraits y i [(w x i ) + w 0 ] g 1 This problem is solved by itroductio of Lagrage multipliers ad miimizatio of the fuctio Here R i are Lagrage multipliers. Differetiatig over w ad w i ad substitutig we obtai Q(w,w 0,R) ) 1 2 (w w) - R i {y i [(w x i ) + w 0 ] - 1} Figure 1. Priciple of SVM classificatio. The task was to separate two classes of objects idicated by squares ad circles. Squares represet oclass samples ( egative examples, e.g. odrugs) ad circles are class members ( positive examples, e.g. drugs). D(x) is the decisio fuctio defiig class membership accordig to the SVM classifier which is represeted by the separatig lie (D(x) ) 0). The margi is idicated by dotted lies. Support vectors are idicated by filled objects (x 2, x 2, x 3, x 4 ). ξ i are slack variables for support vectors that are ot lyig o the margi border. y i are label-variables equal to 1 for positive examples (class membership) ad -1 for egative examples (oclass membership). See text for details. max subject to costraits Q(R) ) R i - 1 R i R j y i y j (x i x j ) 2 i,j)1 Whe perfect separatio is ot possible slack variables are itroduced for sample vectors which are withi the margi, ad the optimizatio problem ca be reformulated: Here ξ i are slack variables. These variables are ot equal to zero oly for those vectors which are withi the margi. Itroducig Lagrage multipliers agai we fially obtai This is a quadratic programmig (QP) problem for which several efficiet stadard methods are kow. 22 Due to the very high dimesioality of the QP problem, which typically arises durig SVM traiig, a extesio of the algorithm for solvig QP is used i SVM applicatios. 23 A geometrical illustratio of the meaig of slack variables ad Lagrage multipliers is give i Figure 1. Poits classified by SVM ca be divided ito two groups, support vectors ad osupport vectors. Nosupport vectors are classified correctly by the hyperplae ad are located outside y i R i ) 0; R i g 0,i ) 1,..., miimize η(w) ) 1 2 (w w) + C ξ i i subject to costraits y i [(w x i ) + w 0 ] g 1 - ξ i max subject to costraits Q(R) ) R i - 1 R i R j y i y j (x i x j ) 2 i,j)1 y i R i ) 0, C g R i g 0,i ) 1,...,

1884 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. the separatig margi. Slack variables ad Lagrage multipliers for them are equal to zero. Parameters of the hyperplae do ot deped o them, ad eve if their positio is chaged the separatig hyperplae ad margi will remai uchaged, provided that these poits will stay outside the margi. Other poits are support vectors, ad they are the poits which determie the exact positio of the hyperplae. For all support vectors the absolute values of the slack variables are equal to the distaces from these poits to the edge of the separatig margi. These distaces are defied i the uits of half of the width of the separatig margi. For correctly classified poits withi the separatig margi, slack variable values are betwee zero ad oe. For misclassified poits withi the margi the values of the slack variables are betwee oe ad two. For other misclassified poits they are greater tha two. For poits that are lyig o the edge of margi, Lagrage multipliers are betwee zero ad C, ad slack variables for these poits are still equal to zero. For all other poits, for which the values of slack variables are larger tha zero, Lagrage multipliers assume the value of C. Explicit mappig to a very high-dimesioal space is ot required if calculatio of the scalar product i this high dimesioal space of every two vectors is feasible. This scalar product ca be defied by itroducig a kerel fuctio(x x ) ) K(x,x ), 24 where x ad x are vectors i a low-dimesioal space for which a kerel fuctio that correspods to a scalar product i a high dimesioal space is defied. Various kerels may be applied. 25 I our case, we used a kerel fuctio of a fifth-order polyomial: K(x,x ) ) ((x x )s + r) 5 This kerel correspods to the decisio fuctio f(x) ) sig( R i K(x sv i, x) + b) i where R i are Lagrage multipliers determied durig traiig of SVM. The sum is oly over support vectors x sv. Lagrage multipliers for all other poits are equal to zero. Parameter b determies the shift of the hyperplae, ad it is also foud durig SVM traiig. Simultaeous scalig of s, r, ad b parameters does ot chage the decisio fuctio. Thus, we ca simplify the kerel by settig r equal to oe: K(x,x ) ) ((x x )s + 1) 5 I this case oly the kerel parameter s ad error tradeoff C must be tued. Parameter C is ot preset explicitly i this equatio; it is set up as a pealty for the misclassificatio error before the traiig of SVM is performed. For tuig parameters s ad C, four-times cross-validatio of traiig data was applied, ad values for s ad C that maximize accuracy were the chose. Accuracy maximizatio was performed by heuristics based gradiet descet. 26 Basically, the followig procedure was applied. The data set was divided ito two parts, traiig ad validatio set. The validatio subset was put aside ad used oly for estimatio of the performace of the traied classifier. Traiig data were divided ito four ooverlappig subsets. The SVM parameters to be determied were set to reasoable iitial values. The, the SVM was traied o the traiig data Figure 2. Architecture of artificial eural etworks. Formal euros are draw as circles, weights are represeted by lies coectig the euro layers. Fa-out euros are draw i white, sigmoidal uits i black, ad liear uits i gray. (a) covetioal three-layered feed-forward system ( architecture I ); (b) etwork architecture used by Ajay ad co-workers for drug-likeess predictio ( architecture II ). 6 excludig oe of the four subsets, ad the performace of the obtaied SVM classifier was estimated with the excluded subset. This procedure was repeated for each subset, ad a average performace of the SVM classifier was obtaied. For SVM traiig we used freely available SVM software (SVM-Light package; URL: http://svmlight.joachims. org/). 26,27 A Liux-based LSF (Load Sharig Facility; Platform Computig GmbH, D-40878 Ratige, Germay) cluster was used for determiatio of the cross-validatio error to reduce calculatio time. All calculatios were performed usig the MATLAB package (MATLAB 2002, The mathematical laboratory. The MathWorks GmbH, D-52064 Aache, Germay). ARTIFICIAL NEURAL NETWORK Covetioal two-layered eural etworks with a sigle output euro were used for ANN model developmet (Figure 2a). 26 As a result of etwork traiig a decisio fuctio is chose from the family of fuctios represeted by the etwork architecture. This fuctio family is defied by the complexity of the eural etwork: umber of hidde layers, umber of euros i these layers, ad topology of the etwork. The decisio fuctio is determied by choosig appropriate weights for the eural etwork. Optimal weights usually miimize a error fuctio for the particular etwork architecture. The error fuctio describes the deviatio of predicted target values from observed or desired values. For our class/oclass classificatio problem the target values were 1 for class (drugs) ad -1 for oclass (odrugs). Stadard two-layered eural etwork with a sigle output euro ca be represeted by the followig equatio y ) g ( M w 1j j)1 d w ji (2) g( (1) x i + w (1) j0 ) + w 11 with the error fuctio E ) k)1 (y(x k ) - y k ) 2. I this work, g is a liear fuctio ad g is a ta-sigmoid trasfer fuctio. A secod type etwork architecture cotaiig additioal coectios from the iput layer to the output layer was traied to reimplemet the origial drug/odrug ANN developed by Ajay ad co-workers (Figure 2b). 6 Traiig of eural etwork is typically performed o variatios of gradiet descet based algorithms, 26 tryig to (2) )

Table 1. Cross-Validated Results of Machie Learig a % correct Matthews cc ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 1885 descriptors ANN SVM ANN SVM GC 79.25 ( 0.66 80.01 ( 0.087 0.567 ( 0.012 0.592 ( 0.002 MOE 77.89 ( 0.74 80.19 ( 0.74 0.537 ( 0.013 0.593 ( 0.016 CATS_225 72.13 ( 0.88 73.90 ( 0.51 0.432 ( 0.013 0.485 ( 0.011 all (GC+MOE+CATS) 80.05 ( 1.02 82.24 ( 0.66 0.579 ( 0.018 0.633 ( 0.010 a Average values ad stadard deviatios are give. The Leveberg-Marquardt traiig method was used for ANN traiig. miimize a error fuctio. To avoid overfittig crossvalidatio ca be used for fidig a earlier poit of traiig. 28 I this work the eural etwork toolbox from MATLAB was used. Data were preprocessed idetically to SVM based learig. We applied the followig traiig algorithms to ANN optimizatio i their default versios provided by MATLAB: gradiet descet with variable learig rate, 29,30 cojugated gradiet descet, 30,31 scaled cojugated gradiet descet, 32 quasi-newto algorithm, 33 Leveberg-Marquardt (LM), 34,35 ad automated regularizatio. 36 For each optimizatio te-times cross-validatio was performed (80+20 splits ito traiig ad test data), where the ANN weights ad biases were optimized usig the traiig data, ad predictio accuracy was measured usig test data to determie the umber of traiig epochs, i.e., the edpoit of the traiig process. This was performed to reduce the risk of overfittig. It should be oted that the validatio data were left utouched. MODEL VALIDATION The SVM model for drug/odrug classificatio of a patter x was SVM(x) ) (a i K(x SV i, x) + b) i Here, i rus oly over support vectors (SV). The value of SVM(x) is either positive ( drug ) or egative ( odrug ). The ANN model for drug/odrug classificatio produced values i ]-1,1[, where a positive value meat drug ad a egative value odrug. Classificatio accuracy was evaluated based o predictio accuracy, i.e., percet of test compouds correctly classified, ad the correlatio coefficiet accordig to Matthews: 37 NP - OU cc ) (N + O)(N + U)(P + O)(P + U) where P, N, O, ad U are the umber of true positive, true egative, false positive, ad false egative predictios, respectively. Drugs were cosidered as positive set, the odrug molecules formed the egative set. The values of cc ca rage from -1 to 1. Perfect predictio gives a correlatio coefficiet of 1. SVM ad ANN models were developed usig various sizes of traiig data to measure the ifluece of the size of the traiig set o the quality of the classificatio model. The umber of traiig samples was iteratively dimiished: Startig with a 80+20 radom split of all available samples ito traiig ad validatio subsets, at each of the followig iteratios we dimiished the size of the traiig set to oly 80% of the umber of samples of the previous iteratio. This allowed us to obtai better samplig for small traiig sets. 10-times cross-validatio was performed, ad average values of predictio accuracy ad cc were calculated. RESULTS AND DISCUSSION The mai aim of this study was to compare SVM ad ANN classifiers i their ability to distiguish betwee sets of drugs ad odrugs. We traied differet eural etwork topologies, ad performace of the best etwork was compared to the SVM classifier. Two types of ANN architecture were cosidered: stadard feed-forward etworks with oe hidde layer ( architecture I ) ad a feed-forward etwork with oe hidde layer with additioal direct coectios from iput euros to the output ( architecture II ) (Figure 2). The first type of ANN was used by Sadowski ad Kubiyi i their origial work o drug-likeess predictio; 7 the secod architecture was employed by Ajay ad co-workers servig the same purpose. 6 Usig these etworks ad the GC descriptors i combiatio with the Leveberg-Marquardt traiig method, classificatio accuracy was idetical to the origial results (o average 80% correct) despite the use of a differet traiig techique ad differet traiig data (Table 1). This observatio substatiates the origial fidigs. Both etwork types performed idetically cosiderig the error margi (approximately 80% correct classificatio). We observed that for some of the traiig algorithms a slightly lower stadard deviatio of the predictio accuracy was observed for architecture I (data ot show). Sice the additioal coectios i etwork architecture II did ot cotribute to a greater accuracy of the model, we used oly the stadard feed-forward etwork with oe hidde layer cotaiig two euros (architecture I) for further aalysis. For each traiig method ad combiatio of iput variables (descriptors) etworks with differet umbers of hidde euros (2-10 euros) were traied. Overall, we did ot observe a overall best traiig algorithm. The Leveberg-Marquardt method was used for the developmet of the fial ANN model. Also, we did ot observe a improved classificatio result whe the umber of hidde euros was larger tha two (data ot show). ANN architecture I with two hidde euros yielded the overall best cross-validated predictio result for all descriptors (GC+MOE+CATS), 80% correct predictios ( cc ) 0.58). The rak order of descriptor sets with regard to the overall classificatio accuracy yielded was as follows: All > GC > MOE > CATS (Table 1). It should be stressed that the differeces i classificatio accuracy are miute for the descriptors All, MOE, ad GC ad should be regarded as comparable cosiderig a stadard deviatio of 1%. The CATS descriptor led to approximately 5% lower accuracy.

1886 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. Figure 3. Average cross-validated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for GC descriptors (upper graph: logarithmic scale; lower graph: liear scale). SVM traiig resulted i models showig slightly higher predictio accuracy tha the ANN systems (Table 1). A 1-2% gai was observed, idepedet of the umber of traiig samples ad method used for eural etwork traiig. Figures 3 ad 4 illustrate the depedecy of the classificatio accuracy o the umber of sample molecules used for traiig. I oe experimet oly GC descriptors were used (Figure 3), i a secod study the combiatio of GC, MOE, ad CATS descriptors was employed (Figure 4). With the GC descriptor the SVM estimator oly slightly outperforms the eural etworks (Figure 3). Similar results were obtaied if oly MOE or CATS descriptors were used for traiig (data ot show). The situatio chaged whe all descriptors were used. With the complete descriptor set (525-dimesioal) SVM clearly outperforms the eural etwork system (Figure 4). These results substatiate earlier fidigs that SVM performs better tha ANN whe large umbers of features or descriptors are used. 12 A geeral observatio was the fact that classificatio accuracy sigificatly improved with a icreasig umber of traiig samples, reachig a plateau i performace betwee 2000 ad 3000 samples (Figures 3 ad 4). The accuracy curves represet almost ideal learig behavior. It should be metioed that the performace plateau observed does ot reflect a iheret clusterig of the data set, as traiig data subsets were radomly selected from the pool. The fractio correctly predicted grows from approximately 65% to 80% whe the traiig set is icreased by a factor of 250. The combiatio of MOE, GC, ad CATS descriptors improved classificatio accuracy by approximately two percet for SVM ad by oe percet for ANN compared to models based o idividual descriptors. These results demostrate that a optimal ANN traiig to a large extet depeds o the umber of traiig patters available ad the type of molecular descriptors used. For istace, for GC descriptors the best learig algorithm was traiig with

ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 1887 Figure 4. Average cross-validated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for the combiatio of GC, MOE, ad CATS descriptors (upper graph: logarithmic scale; lower graph: liear scale). automated regularizatio, but for the combiatio of GC, MOE, ad CATS descriptors this algorithm was extremely slow ad coverged relatively ustable. I cotrast, SVM geerally performed more stably compared to ANN, with oly a small icrease i computatio time for both sets of descriptors (Figures 3 ad 4). I a previous compariso of SVM to several machie learig methods by Holde ad co-workers it was show that a SVM classifier outperformed other stadard methods, but a specially desiged ad structurally optimized eural etwork was agai superior to the SVM model i a bechmark test. 13 This observatio is supported by the observatio that i the preset study the set of molecules which were correctly classified by both SVM ad ANN (mutual true positives) was 72% o average, ad the fractio icorrectly classified by both systems (mutual false egatives) was 11%. 10% of the test data were correctly predicted by SVM but failed by ANN, ad 6% were correctly classified by ANN but ot by SVM usig the full set of descriptors (GC+MOE+CATS). Examples of the latter two sets of molecules are show i Figure 5. Clearly, the ANN classifier ad the SVM classifier complemet each other, ad both methods could be further optimized, for example, by chagig the SVM kerel or by explorig more sophisticated ANN architectures ad cocepts. Fast classifier systems are maily developed for first-pass virtual screeig, i particular for idetificatio ( flaggig ) of potetially udesired molecules i very large compoud collectios. 2 Due to robust covergece behavior SVM seems to be well-suited for solvig biary decisio problems i molecular iformatics, especially whe a large umber of descriptors is available for characterizatio of molecules. I this study we have show that two drug-likeess estimators ca produce complemetary predictios. We recommed the parallel applicatio of both predictive systems for virtual screeig applicatios. Oe possibility to combie several estimators for drug-likeess or ay other classificatio task is to employ a jury decisio, e.g. calculate a esemble

1888 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. determies the success or failure of machie learig systems. Both methods are suited to assess the usefuless of differet descriptor sets for a give classificatio task, ad they are methods of choice for rapid first-pass filterig of compoud libraries. 40 A particular advatage of SVM is sparseess of the solutio. This meas that a SVM classifier depeds oly o the support vectors, ad the classifier fuctio is ot iflueced by the whole data set, as it is the case for may eural etwork systems. Aother characteristic of SVM is the possibility to efficietly deal with a very large umber of features due to the exploitatio of kerel fuctios, which makes it a attractive techique, e.g., for gee chip aalysis or high-dimesioal chemical spaces. The combiatio of SVM with a feature selectio routie might provide a efficiet tool for extractig chemically relevat iformatio. Figure 5. Examples of drugs correctly classified by ANN but ot by SVM (structures 1-5), ad drugs correctly classified by SVM but ot by ANN (structures 6-10). average. 38,39 As more ad more differet predictors become available for virtual screeig a meaigful combiatio of predictio systems that exploits the idividual stregths of the differet methods will be pivotal for reliable compoud library filterig. CONCLUSION It was demostrated that the SVM system used i this study has the capacity to produce higher overall predictio accuracy tha a particular ANN architecture. Based o this observatio we coclude that SVM represets a useful method for classificatio tasks i QSAR modelig ad virtual screeig, especially whe large umbers of iput variables are used. The SVM classifier was show to complemet the predictios obtaied by ANN. The SVM ad ANN classifiers obtaied for drug-likeess predictio are comparable i overall accuracy ad produce overlappig, yet ot idetical sets of correctly ad misclassified compouds. A similar observatio ca be made whe two ANN models are compared. Differet ANN architectures ad traiig algorithms were show to lead to differet classificatio results. Therefore, it might be wise to apply several predictive models i parallel, irrespective of their ature, i.e., beig SVM- or ANN-based. We wish to stress that our study does ot justify the coclusio that SVM outperforms ANN i geeral. I the preset work oly a stadard feed-forward etwork with a fixed umber of hidde euros was compared to a stadard SVM implemetatio. Nevertheless, our results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to stadard ANN traiig. Irrespective of the outcome of this study, it is the appropriate choice of traiig data ad descriptors, ad reasoable scalig of iput variables that ACKNOWLEDGMENT The authors are grateful to Norbert Dichter ad Ralf Tomczak for settig up the LSF Liux cluster. Alireza Givehchi is thaked for assistace i istallig the gecco! Web iterface. This work was supported by the Beilstei- Istitut zur Förderug der Chemische Wisseschafte, Frakfurt. REFERENCES AND NOTES (1) Clark, D. E.; Pickett, S, D. Computatioal methods for the predictio of drug-likeess. Drug DiscoV. Today 2000, 5, 49-58. (2) Scheider, G.; Böhm, H.-J. Virtual screeig ad fast automated dockig methods. Drug DiscoV. Today 2002, 7, 64-70. (3) Wold, S. Expoetially weighted movig pricipal compoet aalysis ad projectios to latet structures. Chemomet. Itell. Lab. Syst. 1994, 23, 149-161. (4) Foria, M.; Casolio, M. C.; de la Pezuela Martiez, C. Multivariate calibratio: applicatios to pharmaceutical aalysis. J. Pharm. Biomed. Aal. 1998, 18, 21-33. (5) Neural Networks i QSAR ad Drug Desig; Devillers, J., Ed.; Academic Press: Lodo, 1996. (6) Ajay; Walters, W. P.; Murcko, M. A. Ca we lear to distiguish betwee drug-like ad odrug-like molecules? J. Med. Chem. 1998, 41, 3314-3324. (7) Sadowski, J.; Kubiyi, H. A scorig scheme for discrimiatig betwee drugs ad odrugs. J. Med. Chem. 1998, 41, 3325-3329. (8) Sadowski, J. Optimizatio of chemical libraries by eural etworks. Curr. Opi. Chem. Biol. 2000, 4, 280-282. (9) Scheider, G. Neural etworks are useful tools for drug desig. Neural Networks 2000, 13, 15-16. (10) Sadowski, J. I Virtual Screeig for BioactiVe Molecules; Böhm, H.-J., Scheider, G., Eds.; Weiheim: Wiley-VCH: 2000; pp 117-129. (11) Cortes, C.; Vapik, V. Support-vector etworks. Machie Learig 1995, 20, 273-297. (12) Vapik, V. The Nature of Statistical Learig Theory; Berli: Spriger, 1995. (13) Burbidge, R.; Trotter, M.; Buxto, B.; Holde, S. Drug desig by machie learig: support vector machies for pharmaceutical data aalysis. Comput. Chem. 2001, 26, 5-14. (14) Warmuth, M. K.; Liao, J.; Ratsch, G.; Mathieso, M.; Putta, S.; Lemme, C. Active learig with Support Vector Machies i the drug discovery process. J. Chem. If. Comput. Sci. 2003, 43, 667-673. (15) Wilto, D.; Willett, P.; Lawso, K.; Mullier, G. Compariso of rakig methods for virtual screeig i lead-discovery programs. J. Chem. If. Comput. Sci. 2003, 43, 469-474. (16) Todeschii, R.; Cosoi, V. Hadbook of Molecular Descriptors; Weiheim: Wiley-VCH: 2000. (17) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity relatioships 1. Partitio coefficiets as a Measure of hydrophobicity. J. Comput. Chem. 1986, 7, 565-577. (18) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity

ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 1889 relatioships 2. Modelig dispersive ad hydrophobic iteractios. J. Comput. Chem. 1987, 27, 21-35. (19) Ghose, A. K.; Pritchett, A.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity relatioships 3. J. Comput. Chem. 1988, 9, 80-90. (20) Scheider, G.; Neidhart, W.; Giller, T.; Schmid, G. Scaffold-hoppig by topological pharmacophore search: a cotributio to virtual screeig. Agew. Chem., It. Ed. Egl. 1999, 38, 2894-2896. (21) Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic geeratio of 3Datomic coordiates for orgaic molecules. Tetrahedro Comput. Methods 1990, 3, 537-547. (22) Colema, T. F.; Li, Y. A reflective Newto method for miimizig a quadratic fuctio subject to bouds o some of the variables. SIAM J. Optimizatio 1996, 6, 1040-1058. (23) Joachims, T. I Makig large-scale SVM learig practical. AdVaces i Kerel Methods - Support Vector Learig; Schölkopf, B., Burges, C., Smola, A., Eds.; MIT-Press: Cambridge, MA, 1999; pp 41-56. (24) Cristiaii, N.; Shawe-Taylor, J. A Itroductio to Support Vector Machies ad Other Kerel-based Learig Methods; Cambridge Uiversity Press: Cambridge, 2000. (25) Burges, C. J. C. A tutorial o support vector machies for patter recogitio. Data Miig Kowledge DiscoVery 1998, 2, 121-167. (26) Bishop, C. M. Neural Networks for Patter Recogitio; Oxford: Oxford Uiversity Press: 1995. (27) Joachims, T. Learig to classify text usig Support Vector Machies. Kluwer Iteratioal Series i Egieerig ad Computer Sciece 668; Kluwer Academic Publishers: Bosto, 2002. (28) Duda, R. O.; Hart, P. E.; Stork, D. G. Patter Classificatio; Wiley- Itersciece: New York, 2000. (29) Rumelhart, D. E.; McClellad, J. L.; The PDB Research Group. Parallel Distributed Processig; MIT Press: Cambridge, MA, 1986. (30) Haga, M. T.; Demuth, H. B.; Beale, M. H. Neural Network Desig; PWS Publishig: Bosto, 1996. (31) Fletcher, R.; Reeves, C. M. Fuctio miimizatio by cojugate gradiets. Comput. J. 1964, 7, 149-154. (32) Moller, M. F. A scaled cojugate gradiet algorithm for fast supervised learig. Neural Networks 1993, 6, 525-533. (33) Deis, J. E.; Schabel, R. B. Numerical Methods for Ucostraied Optimizatio ad Noliear Equatios; Pretice-Hall: Eglewood Cliffs, 1983. (34) Haga, M. T.; Mehaj, M. Traiig feedforward etworks with the Marquardt algorithm. IEEE Tras. Neural Networks 1994, 5, 989-993. (35) Foresee, F. D.; Haga, M. T. Gauss-Newto approximatio to Bayesia regularizatio. Proceedigs of the 1997 Iteratioal Joit Coferece o Neural Networks; pp 1930-1935. (36) MacKay, D. J. C. Bayesia iterpolatio. Neural Comput. 1992, 4, 415-447. (37) Matthews, B. W. Compariso of the predicted ad observed secodary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451. (38) Krogh, A.; Sollich, P. Statistical mechaics of esemble learig. Phys. ReV. E1997, 55, 811-825. (39) Baldi, P.; Bruak, S. Bioiformatics - The Machie Learig Approach; MIT Press: Cambridge, 1998. (40) Byvatov, E.; Scheider, G. Support vector machie applicatios i bioiformatics. Appl. Bioif. 2003, 2, 67-77. CI0341161