Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification


 Evan West
 1 years ago
 Views:
Transcription
1 1882 J. Chem. If. Comput. Sci. 2003, 43, Compariso of Support Vector Machie ad Artificial Neural Network Systems for Drug/Nodrug Classificatio Evgey Byvatov, Uli Fecher, Jes Sadowski, ad Gisbert Scheider*, Istitut für Orgaische Chemie ud Chemische Biologie, Joha Wolfgag GoetheUiversität, MarieCurieStrasse 11, D Frakfurt, Germay, ad AstraZeeca R&D Möldal, SC 264, S Möldal, Swede Received Jue 13, 2003 Support vector machie (SVM) ad artificial eural etwork (ANN) systems were applied to a drug/odrug classificatio problem as a example of biary decisio problems i earlyphase virtual compoud filterig ad screeig. The results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to ANN traiig. Geerally, the SVM classifier yielded slightly higher predictio accuracy tha ANN, irrespective of the type of descriptors used for molecule ecodig, the size of the traiig data sets, ad the algorithm employed for eural etwork traiig. The performace was compared usig various differet descriptor sets ad descriptor combiatios based o the 120 stadard GhoseCrippe fragmet descriptors, a wide rage of 180 differet properties ad physicochemical descriptors from the Molecular Operatig Eviromet (MOE) package, ad 225 topological pharmacophore (CATS) descriptors. For the complete set of 525 descriptors crossvalidated classificatio by SVM yielded 82% correct predictios (Matthews cc ) 0.63), whereas ANN reached 80% correct predictios (Matthews cc ) 0.58). Although SVM outperformed the ANN classifiers with regard to overall predictio accuracy, both methods were show to complemet each other, as the sets of true positives, false positives (overpredictio), true egatives, ad false egatives (uderpredictio) produced by the two classifiers were ot idetical. The theory of SVM ad ANN traiig is briefly reviewed. INTRODUCTION Earlyphase virtual screeig ad compoud library desig ofte employs filterig routies which are based o biary classifiers ad are meat to elimiate potetially uwated molecules from a compoud library. 1,2 Curretly two classifier systems are most ofte used i these applicatios: PLSbased classifiers 3,4 ad various types of artificial eural etworks (ANN). 59 Typically, these systems yield a average overall accuracy of 80% correct predictios for biary decisio tasks followig the likeess cocept i virtual screeig. 2,10 The support vector machie (SVM) approach was first itroduced by Vapik as a potetial alterative to covetioal artificial eural etworks. 11,12 Its popularity has grow ever sice i various areas of research, ad first applicatios i molecular iformatics ad pharmaceutical research have bee described Although SVM ca be applied to multiclass separatio problems, its origial implemetatio solves biary class/oclass separatio problems. Here we describe applicatio of SVM to the drug/ odrug classificatio problem, which employs a class/ oclass implemetatio of SVM. Both SVM ad ANN algorithms ca be formulated i terms of learig machies. The stadard sceario for classifier developmet cosists of two stages: traiig ad testig. Durig first stage the learig machie is preseted with labeled samples, which are basically dimesioal vectors with a class membership * Correspodig author phoe: ; fax: ; Joha Wolfgag GoetheUiversität. AstraZeeca R&D Möldal. label attached. The learig machie geerates a classifier for predictio of the class label of the iput coordiates. Durig the secod stage, the geeralizatio ability of the model is tested. Curretly various sets of molecular descriptors are available. 16 For applicatio to drug/odrug classificatio of compouds, the molecules are typically represeted by dimesioal vectors. 6,7 I this work, we focused o the fragmetbased GhoseCrippe (GC) descriptors which were used i the origial work of Sadowski ad Kubiyi for drug/odrug classificatio, 7 descriptors provided by the MOE software package (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad CATS topological pharmacophores. 20 Havig defied this molecular represetatio, the task of the preset study was to compare the classificatio ability of stadard SVM ad feedforward ANN o the drug/odrug data. A wwwbased iterface for calculatig the druglikeess score of a molecule usig our SVM solutio based o the CATS descriptor was developed ad ca be foud at URL: gecco.org.chemie.uifrakfurt.de/gecco.html. DATA AND METHODS Data Sets. For SVM ad ANN traiig we used the sets of drug ad odrug molecules prepared by Kubiyi ad Sadowski. 7 From the origial data set 9208 molecules could be processed by our descriptor geeratio software. The fial workig set cotaied 4998 drugs ad 4210 odrug molecules. Three sets of descriptors were calculated: couts of the stadard 120 Ghose Crippe descriptors, /ci CCC: $ America Chemical Society Published o Web 09/27/2003
2 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, descriptors from MOE (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad 225 topological pharmacophore (CATS) descriptors. 20 MOE descriptors iclude various 2D ad 3D descriptors such as volume ad shape desciptors, atom ad bods couts, Kier Hall coectivity ad kappa shape idices, adjacecy ad distace matrix descriptors, pharmacophore feature descriptors, partial charges, potetial eergy descriptors, ad coformatiodepedet charge descriptors. Before calculatig MOE descriptors, sigle 3D coformers were geerated by CORINA CATS descriptors were calculated usig our ow software takig ito cosideratio pairs of atom types separated by up to 15 bods (URL: gecco.org.chemie.uifrakfurt.de/gecco.html). 20 All 225 descriptor colums were idividually autoscaled. A alterative would have bee blockscalig where each descriptor class is autoscaled as a whole, which was ot applied here. Support Vector Machie. SVM classifiers are geerated by a twostep procedure: First, the sample data vectors are mapped ( projected ) to a very highdimesioal space. The dimesio of this space is sigificatly larger tha dimesio of the origial data space. The, the algorithm fids a hyperplae i this space with the largest margi separatig classes of data. It was show that classificatio accuracy usually depeds oly weakly o the specific projectio, provided that the target space is sufficietly high dimesioal. 11 Sometimes it is ot possible to fid the separatig hyperplae eve i a very highdimesioal space. I this case a tradeoff is itroduced betwee the size of the separatig margi ad pealties for every vector which is withi the margi. 11 The basic theory of SVM will be briefly reviewed i the followig. The separatig hyperplae is defied as D(x) ) (w x) + w 0 Here x is a samples vector mapped to a high dimesioal space, ad w ad w 0 are parameters of the hyperplae that SVM will estimate. The the margi ca be expressed as a miimal τ for which holds Without loss of geerality we ca apply a costrait τ w ) 1tow. I this case maximizig τ is equivalet to miimizig w ad SVM traiig is becomig the problem of fidig the miimum of a fuctio with the followig costraits: miimize y k D(x k ) g τ w η(w) ) 1 2 (w w) subject to costraits y i [(w x i ) + w 0 ] g 1 This problem is solved by itroductio of Lagrage multipliers ad miimizatio of the fuctio Here R i are Lagrage multipliers. Differetiatig over w ad w i ad substitutig we obtai Q(w,w 0,R) ) 1 2 (w w)  R i {y i [(w x i ) + w 0 ]  1} Figure 1. Priciple of SVM classificatio. The task was to separate two classes of objects idicated by squares ad circles. Squares represet oclass samples ( egative examples, e.g. odrugs) ad circles are class members ( positive examples, e.g. drugs). D(x) is the decisio fuctio defiig class membership accordig to the SVM classifier which is represeted by the separatig lie (D(x) ) 0). The margi is idicated by dotted lies. Support vectors are idicated by filled objects (x 2, x 2, x 3, x 4 ). ξ i are slack variables for support vectors that are ot lyig o the margi border. y i are labelvariables equal to 1 for positive examples (class membership) ad 1 for egative examples (oclass membership). See text for details. max subject to costraits Q(R) ) R i  1 R i R j y i y j (x i x j ) 2 i,j)1 Whe perfect separatio is ot possible slack variables are itroduced for sample vectors which are withi the margi, ad the optimizatio problem ca be reformulated: Here ξ i are slack variables. These variables are ot equal to zero oly for those vectors which are withi the margi. Itroducig Lagrage multipliers agai we fially obtai This is a quadratic programmig (QP) problem for which several efficiet stadard methods are kow. 22 Due to the very high dimesioality of the QP problem, which typically arises durig SVM traiig, a extesio of the algorithm for solvig QP is used i SVM applicatios. 23 A geometrical illustratio of the meaig of slack variables ad Lagrage multipliers is give i Figure 1. Poits classified by SVM ca be divided ito two groups, support vectors ad osupport vectors. Nosupport vectors are classified correctly by the hyperplae ad are located outside y i R i ) 0; R i g 0,i ) 1,..., miimize η(w) ) 1 2 (w w) + C ξ i i subject to costraits y i [(w x i ) + w 0 ] g 1  ξ i max subject to costraits Q(R) ) R i  1 R i R j y i y j (x i x j ) 2 i,j)1 y i R i ) 0, C g R i g 0,i ) 1,...,
3 1884 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. the separatig margi. Slack variables ad Lagrage multipliers for them are equal to zero. Parameters of the hyperplae do ot deped o them, ad eve if their positio is chaged the separatig hyperplae ad margi will remai uchaged, provided that these poits will stay outside the margi. Other poits are support vectors, ad they are the poits which determie the exact positio of the hyperplae. For all support vectors the absolute values of the slack variables are equal to the distaces from these poits to the edge of the separatig margi. These distaces are defied i the uits of half of the width of the separatig margi. For correctly classified poits withi the separatig margi, slack variable values are betwee zero ad oe. For misclassified poits withi the margi the values of the slack variables are betwee oe ad two. For other misclassified poits they are greater tha two. For poits that are lyig o the edge of margi, Lagrage multipliers are betwee zero ad C, ad slack variables for these poits are still equal to zero. For all other poits, for which the values of slack variables are larger tha zero, Lagrage multipliers assume the value of C. Explicit mappig to a very highdimesioal space is ot required if calculatio of the scalar product i this high dimesioal space of every two vectors is feasible. This scalar product ca be defied by itroducig a kerel fuctio(x x ) ) K(x,x ), 24 where x ad x are vectors i a lowdimesioal space for which a kerel fuctio that correspods to a scalar product i a high dimesioal space is defied. Various kerels may be applied. 25 I our case, we used a kerel fuctio of a fifthorder polyomial: K(x,x ) ) ((x x )s + r) 5 This kerel correspods to the decisio fuctio f(x) ) sig( R i K(x sv i, x) + b) i where R i are Lagrage multipliers determied durig traiig of SVM. The sum is oly over support vectors x sv. Lagrage multipliers for all other poits are equal to zero. Parameter b determies the shift of the hyperplae, ad it is also foud durig SVM traiig. Simultaeous scalig of s, r, ad b parameters does ot chage the decisio fuctio. Thus, we ca simplify the kerel by settig r equal to oe: K(x,x ) ) ((x x )s + 1) 5 I this case oly the kerel parameter s ad error tradeoff C must be tued. Parameter C is ot preset explicitly i this equatio; it is set up as a pealty for the misclassificatio error before the traiig of SVM is performed. For tuig parameters s ad C, fourtimes crossvalidatio of traiig data was applied, ad values for s ad C that maximize accuracy were the chose. Accuracy maximizatio was performed by heuristics based gradiet descet. 26 Basically, the followig procedure was applied. The data set was divided ito two parts, traiig ad validatio set. The validatio subset was put aside ad used oly for estimatio of the performace of the traied classifier. Traiig data were divided ito four ooverlappig subsets. The SVM parameters to be determied were set to reasoable iitial values. The, the SVM was traied o the traiig data Figure 2. Architecture of artificial eural etworks. Formal euros are draw as circles, weights are represeted by lies coectig the euro layers. Faout euros are draw i white, sigmoidal uits i black, ad liear uits i gray. (a) covetioal threelayered feedforward system ( architecture I ); (b) etwork architecture used by Ajay ad coworkers for druglikeess predictio ( architecture II ). 6 excludig oe of the four subsets, ad the performace of the obtaied SVM classifier was estimated with the excluded subset. This procedure was repeated for each subset, ad a average performace of the SVM classifier was obtaied. For SVM traiig we used freely available SVM software (SVMLight package; URL: org/). 26,27 A Liuxbased LSF (Load Sharig Facility; Platform Computig GmbH, D Ratige, Germay) cluster was used for determiatio of the crossvalidatio error to reduce calculatio time. All calculatios were performed usig the MATLAB package (MATLAB 2002, The mathematical laboratory. The MathWorks GmbH, D Aache, Germay). ARTIFICIAL NEURAL NETWORK Covetioal twolayered eural etworks with a sigle output euro were used for ANN model developmet (Figure 2a). 26 As a result of etwork traiig a decisio fuctio is chose from the family of fuctios represeted by the etwork architecture. This fuctio family is defied by the complexity of the eural etwork: umber of hidde layers, umber of euros i these layers, ad topology of the etwork. The decisio fuctio is determied by choosig appropriate weights for the eural etwork. Optimal weights usually miimize a error fuctio for the particular etwork architecture. The error fuctio describes the deviatio of predicted target values from observed or desired values. For our class/oclass classificatio problem the target values were 1 for class (drugs) ad 1 for oclass (odrugs). Stadard twolayered eural etwork with a sigle output euro ca be represeted by the followig equatio y ) g ( M w 1j j)1 d w ji (2) g( (1) x i + w (1) j0 ) + w 11 with the error fuctio E ) k)1 (y(x k )  y k ) 2. I this work, g is a liear fuctio ad g is a tasigmoid trasfer fuctio. A secod type etwork architecture cotaiig additioal coectios from the iput layer to the output layer was traied to reimplemet the origial drug/odrug ANN developed by Ajay ad coworkers (Figure 2b). 6 Traiig of eural etwork is typically performed o variatios of gradiet descet based algorithms, 26 tryig to (2) )
4 Table 1. CrossValidated Results of Machie Learig a % correct Matthews cc ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, descriptors ANN SVM ANN SVM GC ( ( ( ( MOE ( ( ( ( CATS_ ( ( ( ( all (GC+MOE+CATS) ( ( ( ( a Average values ad stadard deviatios are give. The LevebergMarquardt traiig method was used for ANN traiig. miimize a error fuctio. To avoid overfittig crossvalidatio ca be used for fidig a earlier poit of traiig. 28 I this work the eural etwork toolbox from MATLAB was used. Data were preprocessed idetically to SVM based learig. We applied the followig traiig algorithms to ANN optimizatio i their default versios provided by MATLAB: gradiet descet with variable learig rate, 29,30 cojugated gradiet descet, 30,31 scaled cojugated gradiet descet, 32 quasinewto algorithm, 33 LevebergMarquardt (LM), 34,35 ad automated regularizatio. 36 For each optimizatio tetimes crossvalidatio was performed (80+20 splits ito traiig ad test data), where the ANN weights ad biases were optimized usig the traiig data, ad predictio accuracy was measured usig test data to determie the umber of traiig epochs, i.e., the edpoit of the traiig process. This was performed to reduce the risk of overfittig. It should be oted that the validatio data were left utouched. MODEL VALIDATION The SVM model for drug/odrug classificatio of a patter x was SVM(x) ) (a i K(x SV i, x) + b) i Here, i rus oly over support vectors (SV). The value of SVM(x) is either positive ( drug ) or egative ( odrug ). The ANN model for drug/odrug classificatio produced values i ]1,1[, where a positive value meat drug ad a egative value odrug. Classificatio accuracy was evaluated based o predictio accuracy, i.e., percet of test compouds correctly classified, ad the correlatio coefficiet accordig to Matthews: 37 NP  OU cc ) (N + O)(N + U)(P + O)(P + U) where P, N, O, ad U are the umber of true positive, true egative, false positive, ad false egative predictios, respectively. Drugs were cosidered as positive set, the odrug molecules formed the egative set. The values of cc ca rage from 1 to 1. Perfect predictio gives a correlatio coefficiet of 1. SVM ad ANN models were developed usig various sizes of traiig data to measure the ifluece of the size of the traiig set o the quality of the classificatio model. The umber of traiig samples was iteratively dimiished: Startig with a radom split of all available samples ito traiig ad validatio subsets, at each of the followig iteratios we dimiished the size of the traiig set to oly 80% of the umber of samples of the previous iteratio. This allowed us to obtai better samplig for small traiig sets. 10times crossvalidatio was performed, ad average values of predictio accuracy ad cc were calculated. RESULTS AND DISCUSSION The mai aim of this study was to compare SVM ad ANN classifiers i their ability to distiguish betwee sets of drugs ad odrugs. We traied differet eural etwork topologies, ad performace of the best etwork was compared to the SVM classifier. Two types of ANN architecture were cosidered: stadard feedforward etworks with oe hidde layer ( architecture I ) ad a feedforward etwork with oe hidde layer with additioal direct coectios from iput euros to the output ( architecture II ) (Figure 2). The first type of ANN was used by Sadowski ad Kubiyi i their origial work o druglikeess predictio; 7 the secod architecture was employed by Ajay ad coworkers servig the same purpose. 6 Usig these etworks ad the GC descriptors i combiatio with the LevebergMarquardt traiig method, classificatio accuracy was idetical to the origial results (o average 80% correct) despite the use of a differet traiig techique ad differet traiig data (Table 1). This observatio substatiates the origial fidigs. Both etwork types performed idetically cosiderig the error margi (approximately 80% correct classificatio). We observed that for some of the traiig algorithms a slightly lower stadard deviatio of the predictio accuracy was observed for architecture I (data ot show). Sice the additioal coectios i etwork architecture II did ot cotribute to a greater accuracy of the model, we used oly the stadard feedforward etwork with oe hidde layer cotaiig two euros (architecture I) for further aalysis. For each traiig method ad combiatio of iput variables (descriptors) etworks with differet umbers of hidde euros (210 euros) were traied. Overall, we did ot observe a overall best traiig algorithm. The LevebergMarquardt method was used for the developmet of the fial ANN model. Also, we did ot observe a improved classificatio result whe the umber of hidde euros was larger tha two (data ot show). ANN architecture I with two hidde euros yielded the overall best crossvalidated predictio result for all descriptors (GC+MOE+CATS), 80% correct predictios ( cc ) 0.58). The rak order of descriptor sets with regard to the overall classificatio accuracy yielded was as follows: All > GC > MOE > CATS (Table 1). It should be stressed that the differeces i classificatio accuracy are miute for the descriptors All, MOE, ad GC ad should be regarded as comparable cosiderig a stadard deviatio of 1%. The CATS descriptor led to approximately 5% lower accuracy.
5 1886 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. Figure 3. Average crossvalidated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for GC descriptors (upper graph: logarithmic scale; lower graph: liear scale). SVM traiig resulted i models showig slightly higher predictio accuracy tha the ANN systems (Table 1). A 12% gai was observed, idepedet of the umber of traiig samples ad method used for eural etwork traiig. Figures 3 ad 4 illustrate the depedecy of the classificatio accuracy o the umber of sample molecules used for traiig. I oe experimet oly GC descriptors were used (Figure 3), i a secod study the combiatio of GC, MOE, ad CATS descriptors was employed (Figure 4). With the GC descriptor the SVM estimator oly slightly outperforms the eural etworks (Figure 3). Similar results were obtaied if oly MOE or CATS descriptors were used for traiig (data ot show). The situatio chaged whe all descriptors were used. With the complete descriptor set (525dimesioal) SVM clearly outperforms the eural etwork system (Figure 4). These results substatiate earlier fidigs that SVM performs better tha ANN whe large umbers of features or descriptors are used. 12 A geeral observatio was the fact that classificatio accuracy sigificatly improved with a icreasig umber of traiig samples, reachig a plateau i performace betwee 2000 ad 3000 samples (Figures 3 ad 4). The accuracy curves represet almost ideal learig behavior. It should be metioed that the performace plateau observed does ot reflect a iheret clusterig of the data set, as traiig data subsets were radomly selected from the pool. The fractio correctly predicted grows from approximately 65% to 80% whe the traiig set is icreased by a factor of 250. The combiatio of MOE, GC, ad CATS descriptors improved classificatio accuracy by approximately two percet for SVM ad by oe percet for ANN compared to models based o idividual descriptors. These results demostrate that a optimal ANN traiig to a large extet depeds o the umber of traiig patters available ad the type of molecular descriptors used. For istace, for GC descriptors the best learig algorithm was traiig with
6 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, Figure 4. Average crossvalidated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for the combiatio of GC, MOE, ad CATS descriptors (upper graph: logarithmic scale; lower graph: liear scale). automated regularizatio, but for the combiatio of GC, MOE, ad CATS descriptors this algorithm was extremely slow ad coverged relatively ustable. I cotrast, SVM geerally performed more stably compared to ANN, with oly a small icrease i computatio time for both sets of descriptors (Figures 3 ad 4). I a previous compariso of SVM to several machie learig methods by Holde ad coworkers it was show that a SVM classifier outperformed other stadard methods, but a specially desiged ad structurally optimized eural etwork was agai superior to the SVM model i a bechmark test. 13 This observatio is supported by the observatio that i the preset study the set of molecules which were correctly classified by both SVM ad ANN (mutual true positives) was 72% o average, ad the fractio icorrectly classified by both systems (mutual false egatives) was 11%. 10% of the test data were correctly predicted by SVM but failed by ANN, ad 6% were correctly classified by ANN but ot by SVM usig the full set of descriptors (GC+MOE+CATS). Examples of the latter two sets of molecules are show i Figure 5. Clearly, the ANN classifier ad the SVM classifier complemet each other, ad both methods could be further optimized, for example, by chagig the SVM kerel or by explorig more sophisticated ANN architectures ad cocepts. Fast classifier systems are maily developed for firstpass virtual screeig, i particular for idetificatio ( flaggig ) of potetially udesired molecules i very large compoud collectios. 2 Due to robust covergece behavior SVM seems to be wellsuited for solvig biary decisio problems i molecular iformatics, especially whe a large umber of descriptors is available for characterizatio of molecules. I this study we have show that two druglikeess estimators ca produce complemetary predictios. We recommed the parallel applicatio of both predictive systems for virtual screeig applicatios. Oe possibility to combie several estimators for druglikeess or ay other classificatio task is to employ a jury decisio, e.g. calculate a esemble
7 1888 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. determies the success or failure of machie learig systems. Both methods are suited to assess the usefuless of differet descriptor sets for a give classificatio task, ad they are methods of choice for rapid firstpass filterig of compoud libraries. 40 A particular advatage of SVM is sparseess of the solutio. This meas that a SVM classifier depeds oly o the support vectors, ad the classifier fuctio is ot iflueced by the whole data set, as it is the case for may eural etwork systems. Aother characteristic of SVM is the possibility to efficietly deal with a very large umber of features due to the exploitatio of kerel fuctios, which makes it a attractive techique, e.g., for gee chip aalysis or highdimesioal chemical spaces. The combiatio of SVM with a feature selectio routie might provide a efficiet tool for extractig chemically relevat iformatio. Figure 5. Examples of drugs correctly classified by ANN but ot by SVM (structures 15), ad drugs correctly classified by SVM but ot by ANN (structures 610). average. 38,39 As more ad more differet predictors become available for virtual screeig a meaigful combiatio of predictio systems that exploits the idividual stregths of the differet methods will be pivotal for reliable compoud library filterig. CONCLUSION It was demostrated that the SVM system used i this study has the capacity to produce higher overall predictio accuracy tha a particular ANN architecture. Based o this observatio we coclude that SVM represets a useful method for classificatio tasks i QSAR modelig ad virtual screeig, especially whe large umbers of iput variables are used. The SVM classifier was show to complemet the predictios obtaied by ANN. The SVM ad ANN classifiers obtaied for druglikeess predictio are comparable i overall accuracy ad produce overlappig, yet ot idetical sets of correctly ad misclassified compouds. A similar observatio ca be made whe two ANN models are compared. Differet ANN architectures ad traiig algorithms were show to lead to differet classificatio results. Therefore, it might be wise to apply several predictive models i parallel, irrespective of their ature, i.e., beig SVM or ANNbased. We wish to stress that our study does ot justify the coclusio that SVM outperforms ANN i geeral. I the preset work oly a stadard feedforward etwork with a fixed umber of hidde euros was compared to a stadard SVM implemetatio. Nevertheless, our results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to stadard ANN traiig. Irrespective of the outcome of this study, it is the appropriate choice of traiig data ad descriptors, ad reasoable scalig of iput variables that ACKNOWLEDGMENT The authors are grateful to Norbert Dichter ad Ralf Tomczak for settig up the LSF Liux cluster. Alireza Givehchi is thaked for assistace i istallig the gecco! Web iterface. This work was supported by the Beilstei Istitut zur Förderug der Chemische Wisseschafte, Frakfurt. REFERENCES AND NOTES (1) Clark, D. E.; Pickett, S, D. Computatioal methods for the predictio of druglikeess. Drug DiscoV. Today 2000, 5, (2) Scheider, G.; Böhm, H.J. Virtual screeig ad fast automated dockig methods. Drug DiscoV. Today 2002, 7, (3) Wold, S. Expoetially weighted movig pricipal compoet aalysis ad projectios to latet structures. Chemomet. Itell. Lab. Syst. 1994, 23, (4) Foria, M.; Casolio, M. C.; de la Pezuela Martiez, C. Multivariate calibratio: applicatios to pharmaceutical aalysis. J. Pharm. Biomed. Aal. 1998, 18, (5) Neural Networks i QSAR ad Drug Desig; Devillers, J., Ed.; Academic Press: Lodo, (6) Ajay; Walters, W. P.; Murcko, M. A. Ca we lear to distiguish betwee druglike ad odruglike molecules? J. Med. Chem. 1998, 41, (7) Sadowski, J.; Kubiyi, H. A scorig scheme for discrimiatig betwee drugs ad odrugs. J. Med. Chem. 1998, 41, (8) Sadowski, J. Optimizatio of chemical libraries by eural etworks. Curr. Opi. Chem. Biol. 2000, 4, (9) Scheider, G. Neural etworks are useful tools for drug desig. Neural Networks 2000, 13, (10) Sadowski, J. I Virtual Screeig for BioactiVe Molecules; Böhm, H.J., Scheider, G., Eds.; Weiheim: WileyVCH: 2000; pp (11) Cortes, C.; Vapik, V. Supportvector etworks. Machie Learig 1995, 20, (12) Vapik, V. The Nature of Statistical Learig Theory; Berli: Spriger, (13) Burbidge, R.; Trotter, M.; Buxto, B.; Holde, S. Drug desig by machie learig: support vector machies for pharmaceutical data aalysis. Comput. Chem. 2001, 26, (14) Warmuth, M. K.; Liao, J.; Ratsch, G.; Mathieso, M.; Putta, S.; Lemme, C. Active learig with Support Vector Machies i the drug discovery process. J. Chem. If. Comput. Sci. 2003, 43, (15) Wilto, D.; Willett, P.; Lawso, K.; Mullier, G. Compariso of rakig methods for virtual screeig i leaddiscovery programs. J. Chem. If. Comput. Sci. 2003, 43, (16) Todeschii, R.; Cosoi, V. Hadbook of Molecular Descriptors; Weiheim: WileyVCH: (17) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for threedimesioal structuredirected quatitative structureactivity relatioships 1. Partitio coefficiets as a Measure of hydrophobicity. J. Comput. Chem. 1986, 7, (18) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for threedimesioal structuredirected quatitative structureactivity
8 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, relatioships 2. Modelig dispersive ad hydrophobic iteractios. J. Comput. Chem. 1987, 27, (19) Ghose, A. K.; Pritchett, A.; Crippe, G. M. Atomic physicochemical parameters for threedimesioal structuredirected quatitative structureactivity relatioships 3. J. Comput. Chem. 1988, 9, (20) Scheider, G.; Neidhart, W.; Giller, T.; Schmid, G. Scaffoldhoppig by topological pharmacophore search: a cotributio to virtual screeig. Agew. Chem., It. Ed. Egl. 1999, 38, (21) Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic geeratio of 3Datomic coordiates for orgaic molecules. Tetrahedro Comput. Methods 1990, 3, (22) Colema, T. F.; Li, Y. A reflective Newto method for miimizig a quadratic fuctio subject to bouds o some of the variables. SIAM J. Optimizatio 1996, 6, (23) Joachims, T. I Makig largescale SVM learig practical. AdVaces i Kerel Methods  Support Vector Learig; Schölkopf, B., Burges, C., Smola, A., Eds.; MITPress: Cambridge, MA, 1999; pp (24) Cristiaii, N.; ShaweTaylor, J. A Itroductio to Support Vector Machies ad Other Kerelbased Learig Methods; Cambridge Uiversity Press: Cambridge, (25) Burges, C. J. C. A tutorial o support vector machies for patter recogitio. Data Miig Kowledge DiscoVery 1998, 2, (26) Bishop, C. M. Neural Networks for Patter Recogitio; Oxford: Oxford Uiversity Press: (27) Joachims, T. Learig to classify text usig Support Vector Machies. Kluwer Iteratioal Series i Egieerig ad Computer Sciece 668; Kluwer Academic Publishers: Bosto, (28) Duda, R. O.; Hart, P. E.; Stork, D. G. Patter Classificatio; Wiley Itersciece: New York, (29) Rumelhart, D. E.; McClellad, J. L.; The PDB Research Group. Parallel Distributed Processig; MIT Press: Cambridge, MA, (30) Haga, M. T.; Demuth, H. B.; Beale, M. H. Neural Network Desig; PWS Publishig: Bosto, (31) Fletcher, R.; Reeves, C. M. Fuctio miimizatio by cojugate gradiets. Comput. J. 1964, 7, (32) Moller, M. F. A scaled cojugate gradiet algorithm for fast supervised learig. Neural Networks 1993, 6, (33) Deis, J. E.; Schabel, R. B. Numerical Methods for Ucostraied Optimizatio ad Noliear Equatios; PreticeHall: Eglewood Cliffs, (34) Haga, M. T.; Mehaj, M. Traiig feedforward etworks with the Marquardt algorithm. IEEE Tras. Neural Networks 1994, 5, (35) Foresee, F. D.; Haga, M. T. GaussNewto approximatio to Bayesia regularizatio. Proceedigs of the 1997 Iteratioal Joit Coferece o Neural Networks; pp (36) MacKay, D. J. C. Bayesia iterpolatio. Neural Comput. 1992, 4, (37) Matthews, B. W. Compariso of the predicted ad observed secodary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, (38) Krogh, A.; Sollich, P. Statistical mechaics of esemble learig. Phys. ReV. E1997, 55, (39) Baldi, P.; Bruak, S. Bioiformatics  The Machie Learig Approach; MIT Press: Cambridge, (40) Byvatov, E.; Scheider, G. Support vector machie applicatios i bioiformatics. Appl. Bioif. 2003, 2, CI
RealTime Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations
RealTime Computig Without Stable States: A New Framework for Neural Computatio Based o Perturbatios Wolfgag aass+, Thomas Natschläger+ & Hery arkram* + Istitute for Theoretical Computer Sciece, Techische
More informationKernel Mean Estimation and Stein Effect
Krikamol Muadet KRIKAMOL@TUEBINGEN.MPG.DE Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Keji Fukumizu FUKUMIZU@ISM.AC.JP The Istitute of Statistical Mathematics,
More informationSUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationStéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
More informationConsistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationThe Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Joural of Machie Learig Research 0 2009 22952328 Submitted 3/09; Revised 5/09; ublished 0/09 The Noparaormal: Semiparametric Estimatio of High Dimesioal Udirected Graphs Ha Liu Joh Lafferty Larry Wasserma
More informationCounterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Joural of Machie Learig Research 14 (2013) 32073260 Submitted 9/12; Revised 3/13; Published 11/13 Couterfactual Reasoig ad Learig Systems: The Example of Computatioal Advertisig Léo Bottou Microsoft 1
More informationType Less, Find More: Fast Autocompletion Search with a Succinct Index
Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast MaxPlackIstitut für Iformatik Saarbrücke, Germay bast@mpiif.mpg.de Igmar Weber MaxPlackIstitut für Iformatik Saarbrücke,
More informationMeanSemivariance Optimization: A Heuristic Approach
57 MeaSemivariace Optimizatio: A Heuristic Approach Javier Estrada Academics ad practitioers optimize portfolios usig the meavariace approach far more ofte tha the measemivariace approach, despite the
More informationMAXIMUM LIKELIHOODESTIMATION OF DISCRETELY SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH. By Yacine AïtSahalia 1
Ecoometrica, Vol. 7, No. 1 (Jauary, 22), 223 262 MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH By acie AïtSahalia 1 Whe a cotiuoustime diffusio is
More informationJ. J. Kennedy, 1 N. A. Rayner, 1 R. O. Smith, 2 D. E. Parker, 1 and M. Saunby 1. 1. Introduction
Reassessig biases ad other ucertaities i seasurface temperature observatios measured i situ sice 85, part : measuremet ad samplig ucertaities J. J. Keedy, N. A. Rayer, R. O. Smith, D. E. Parker, ad M.
More informationEverything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask
Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods.
More informationHow Has the Literature on Gini s Index Evolved in the Past 80 Years?
How Has the Literature o Gii s Idex Evolved i the Past 80 Years? Kua Xu Departmet of Ecoomics Dalhousie Uiversity Halifax, Nova Scotia Caada B3H 3J5 Jauary 2004 The author started this survey paper whe
More informationTeaching Bayesian Reasoning in Less Than Two Hours
Joural of Experimetal Psychology: Geeral 21, Vol., No. 3, 4 Copyright 21 by the America Psychological Associatio, Ic. 963445/1/S5. DOI: 1.7//963445..3. Teachig Bayesia Reasoig i Less Tha Two Hours Peter
More informationNoisy mean field stochastic games with network applications
Noisy mea field stochastic games with etwork applicatios Hamidou Tembie LSS, CNRSSupélecUiv. Paris Sud, Frace Email: tembie@ieee.org Pedro Vilaova AMCS, KAUST, Saudi Arabia Email:pedro.guerra@kaust.edu.sa
More informationA General Multilevel SEM Framework for Assessing Multilevel Mediation
Psychological Methods 1, Vol. 15, No. 3, 9 33 1 America Psychological Associatio 18989X/1/$1. DOI: 1.137/a141 A Geeral Multilevel SEM Framework for Assessig Multilevel Mediatio Kristopher J. Preacher
More informationThe Unicorn, The Normal Curve, and Other Improbable Creatures
Psychological Bulleti 1989, Vol. 105. No.1, 156166 The Uicor, The Normal Curve, ad Other Improbable Creatures Theodore Micceri 1 Departmet of Educatioal Leadership Uiversity of South Florida A ivestigatio
More informationTHE PROBABLE ERROR OF A MEAN. Introduction
THE PROBABLE ERROR OF A MEAN By STUDENT Itroductio Ay experimet may he regarded as formig a idividual of a populatio of experimets which might he performed uder the same coditios. A series of experimets
More informationSOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationDesign for Customer Sustainable Customer Integration into the Development Processes of ProductService System Providers
PAPER ID 2 Desig for Customer Sustaiable Customer Itegratio ito the Developmet Processes of ProductService System Providers Alexader Burger, Vitalis Bittel, Ramez Awad, Jivka Ovtcharova Istitute for Iformatio
More informationCahier technique no. 194
Collectio Techique... Cahier techique o. 194 Curret trasformers: how to specify them P. Foti "Cahiers Techiques" is a collectio of documets iteded for egieers ad techicias, people i the idustry who are
More informationDryad: Distributed DataParallel Programs from Sequential Building Blocks
Dryad: Distributed DataParallel Programs from Sequetial uildig locks Michael Isard Microsoft esearch, Silico Valley drew irrell Microsoft esearch, Silico Valley Mihai udiu Microsoft esearch, Silico Valley
More informationA Direct Approach to Inference in Nonparametric and Semiparametric Quantile Models
A Direct Approach to Iferece i Noparametric ad Semiparametric Quatile Models Yaqi Fa ad Ruixua Liu Uiversity of Washigto, Seattle Workig Paper o. 40 Ceter for Statistics ad the Social Scieces Uiversity
More informationBy Deloitte & Touche LLP Dr. Patchin Curtis Mark Carey
C o m m i t t e e o f S p o s o r i g O r g a i z a t i o s o f t h e T r e a d w a y C o m m i s s i o T h o u g h t L e a d e r s h i p i E R M R I S K A S S E S S M E N T I N P R A C T I C E By Deloitte
More informationSystemic Risk and Stability in Financial Networks
America Ecoomic Review 2015, 105(2): 564 608 http://dx.doi.org/10.1257/aer.20130456 Systemic Risk ad Stability i Fiacial Networks By Daro Acemoglu, Asuma Ozdaglar, ad Alireza TahbazSalehi * This paper
More informationSoftware Reliability via RuTime ResultCheckig Hal Wasserma Uiversity of Califoria, Berkeley ad Mauel Blum City Uiversity of Hog Kog ad Uiversity of Califoria, Berkeley We review the eld of resultcheckig,
More informationWhich Extreme Values Are Really Extreme?
Which Extreme Values Are Really Extreme? JESÚS GONZALO Uiversidad Carlos III de Madrid JOSÉ OLMO Uiversidad Carlos III de Madrid abstract We defie the extreme values of ay radom sample of size from a distributio
More informationCrowds: Anonymity for Web Transactions
Crowds: Aoymity for Web Trasactios Michael K. Reiter ad Aviel D. Rubi AT&T Labs Research I this paper we itroduce a system called Crowds for protectig users aoymity o the worldwideweb. Crowds, amed for
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More information