Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification

Size: px
Start display at page:

Download "Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification"

Transcription

1 1882 J. Chem. If. Comput. Sci. 2003, 43, Compariso of Support Vector Machie ad Artificial Neural Network Systems for Drug/Nodrug Classificatio Evgey Byvatov, Uli Fecher, Jes Sadowski, ad Gisbert Scheider*, Istitut für Orgaische Chemie ud Chemische Biologie, Joha Wolfgag Goethe-Uiversität, Marie-Curie-Strasse 11, D Frakfurt, Germay, ad AstraZeeca R&D Möldal, SC 264, S Möldal, Swede Received Jue 13, 2003 Support vector machie (SVM) ad artificial eural etwork (ANN) systems were applied to a drug/odrug classificatio problem as a example of biary decisio problems i early-phase virtual compoud filterig ad screeig. The results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to ANN traiig. Geerally, the SVM classifier yielded slightly higher predictio accuracy tha ANN, irrespective of the type of descriptors used for molecule ecodig, the size of the traiig data sets, ad the algorithm employed for eural etwork traiig. The performace was compared usig various differet descriptor sets ad descriptor combiatios based o the 120 stadard Ghose-Crippe fragmet descriptors, a wide rage of 180 differet properties ad physicochemical descriptors from the Molecular Operatig Eviromet (MOE) package, ad 225 topological pharmacophore (CATS) descriptors. For the complete set of 525 descriptors cross-validated classificatio by SVM yielded 82% correct predictios (Matthews cc ) 0.63), whereas ANN reached 80% correct predictios (Matthews cc ) 0.58). Although SVM outperformed the ANN classifiers with regard to overall predictio accuracy, both methods were show to complemet each other, as the sets of true positives, false positives (overpredictio), true egatives, ad false egatives (uderpredictio) produced by the two classifiers were ot idetical. The theory of SVM ad ANN traiig is briefly reviewed. INTRODUCTION Early-phase virtual screeig ad compoud library desig ofte employs filterig routies which are based o biary classifiers ad are meat to elimiate potetially uwated molecules from a compoud library. 1,2 Curretly two classifier systems are most ofte used i these applicatios: PLSbased classifiers 3,4 ad various types of artificial eural etworks (ANN). 5-9 Typically, these systems yield a average overall accuracy of 80% correct predictios for biary decisio tasks followig the likeess cocept i virtual screeig. 2,10 The support vector machie (SVM) approach was first itroduced by Vapik as a potetial alterative to covetioal artificial eural etworks. 11,12 Its popularity has grow ever sice i various areas of research, ad first applicatios i molecular iformatics ad pharmaceutical research have bee described Although SVM ca be applied to multiclass separatio problems, its origial implemetatio solves biary class/oclass separatio problems. Here we describe applicatio of SVM to the drug/ odrug classificatio problem, which employs a class/ oclass implemetatio of SVM. Both SVM ad ANN algorithms ca be formulated i terms of learig machies. The stadard sceario for classifier developmet cosists of two stages: traiig ad testig. Durig first stage the learig machie is preseted with labeled samples, which are basically -dimesioal vectors with a class membership * Correspodig author phoe: ; fax: ; Joha Wolfgag Goethe-Uiversität. AstraZeeca R&D Möldal. label attached. The learig machie geerates a classifier for predictio of the class label of the iput coordiates. Durig the secod stage, the geeralizatio ability of the model is tested. Curretly various sets of molecular descriptors are available. 16 For applicatio to drug/odrug classificatio of compouds, the molecules are typically represeted by -dimesioal vectors. 6,7 I this work, we focused o the fragmet-based Ghose-Crippe (GC) descriptors which were used i the origial work of Sadowski ad Kubiyi for drug/odrug classificatio, 7 descriptors provided by the MOE software package (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad CATS topological pharmacophores. 20 Havig defied this molecular represetatio, the task of the preset study was to compare the classificatio ability of stadard SVM ad feed-forward ANN o the drug/odrug data. A wwwbased iterface for calculatig the drug-likeess score of a molecule usig our SVM solutio based o the CATS descriptor was developed ad ca be foud at URL: gecco.org.chemie.ui-frakfurt.de/gecco.html. DATA AND METHODS Data Sets. For SVM ad ANN traiig we used the sets of drug ad odrug molecules prepared by Kubiyi ad Sadowski. 7 From the origial data set 9208 molecules could be processed by our descriptor geeratio software. The fial workig set cotaied 4998 drugs ad 4210 odrug molecules. Three sets of descriptors were calculated: couts of the stadard 120 Ghose Crippe descriptors, /ci CCC: $ America Chemical Society Published o Web 09/27/2003

2 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, descriptors from MOE (Molecular Operatig Eviromet. Chemical Computig Group Ic., Motreal, Caada), ad 225 topological pharmacophore (CATS) descriptors. 20 MOE descriptors iclude various 2D ad 3D descriptors such as volume ad shape desciptors, atom ad bods couts, Kier- Hall coectivity ad kappa shape idices, adjacecy ad distace matrix descriptors, pharmacophore feature descriptors, partial charges, potetial eergy descriptors, ad coformatio-depedet charge descriptors. Before calculatig MOE descriptors, sigle 3D coformers were geerated by CORINA CATS descriptors were calculated usig our ow software takig ito cosideratio pairs of atom types separated by up to 15 bods (URL: gecco.org.chemie.ui-frakfurt.de/gecco.html). 20 All 225 descriptor colums were idividually autoscaled. A alterative would have bee block-scalig where each descriptor class is autoscaled as a whole, which was ot applied here. Support Vector Machie. SVM classifiers are geerated by a two-step procedure: First, the sample data vectors are mapped ( projected ) to a very high-dimesioal space. The dimesio of this space is sigificatly larger tha dimesio of the origial data space. The, the algorithm fids a hyperplae i this space with the largest margi separatig classes of data. It was show that classificatio accuracy usually depeds oly weakly o the specific projectio, provided that the target space is sufficietly high dimesioal. 11 Sometimes it is ot possible to fid the separatig hyperplae eve i a very high-dimesioal space. I this case a tradeoff is itroduced betwee the size of the separatig margi ad pealties for every vector which is withi the margi. 11 The basic theory of SVM will be briefly reviewed i the followig. The separatig hyperplae is defied as D(x) ) (w x) + w 0 Here x is a samples vector mapped to a high dimesioal space, ad w ad w 0 are parameters of the hyperplae that SVM will estimate. The the margi ca be expressed as a miimal τ for which holds Without loss of geerality we ca apply a costrait τ w ) 1tow. I this case maximizig τ is equivalet to miimizig w ad SVM traiig is becomig the problem of fidig the miimum of a fuctio with the followig costraits: miimize y k D(x k ) g τ w η(w) ) 1 2 (w w) subject to costraits y i [(w x i ) + w 0 ] g 1 This problem is solved by itroductio of Lagrage multipliers ad miimizatio of the fuctio Here R i are Lagrage multipliers. Differetiatig over w ad w i ad substitutig we obtai Q(w,w 0,R) ) 1 2 (w w) - R i {y i [(w x i ) + w 0 ] - 1} Figure 1. Priciple of SVM classificatio. The task was to separate two classes of objects idicated by squares ad circles. Squares represet oclass samples ( egative examples, e.g. odrugs) ad circles are class members ( positive examples, e.g. drugs). D(x) is the decisio fuctio defiig class membership accordig to the SVM classifier which is represeted by the separatig lie (D(x) ) 0). The margi is idicated by dotted lies. Support vectors are idicated by filled objects (x 2, x 2, x 3, x 4 ). ξ i are slack variables for support vectors that are ot lyig o the margi border. y i are label-variables equal to 1 for positive examples (class membership) ad -1 for egative examples (oclass membership). See text for details. max subject to costraits Q(R) ) R i - 1 R i R j y i y j (x i x j ) 2 i,j)1 Whe perfect separatio is ot possible slack variables are itroduced for sample vectors which are withi the margi, ad the optimizatio problem ca be reformulated: Here ξ i are slack variables. These variables are ot equal to zero oly for those vectors which are withi the margi. Itroducig Lagrage multipliers agai we fially obtai This is a quadratic programmig (QP) problem for which several efficiet stadard methods are kow. 22 Due to the very high dimesioality of the QP problem, which typically arises durig SVM traiig, a extesio of the algorithm for solvig QP is used i SVM applicatios. 23 A geometrical illustratio of the meaig of slack variables ad Lagrage multipliers is give i Figure 1. Poits classified by SVM ca be divided ito two groups, support vectors ad osupport vectors. Nosupport vectors are classified correctly by the hyperplae ad are located outside y i R i ) 0; R i g 0,i ) 1,..., miimize η(w) ) 1 2 (w w) + C ξ i i subject to costraits y i [(w x i ) + w 0 ] g 1 - ξ i max subject to costraits Q(R) ) R i - 1 R i R j y i y j (x i x j ) 2 i,j)1 y i R i ) 0, C g R i g 0,i ) 1,...,

3 1884 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. the separatig margi. Slack variables ad Lagrage multipliers for them are equal to zero. Parameters of the hyperplae do ot deped o them, ad eve if their positio is chaged the separatig hyperplae ad margi will remai uchaged, provided that these poits will stay outside the margi. Other poits are support vectors, ad they are the poits which determie the exact positio of the hyperplae. For all support vectors the absolute values of the slack variables are equal to the distaces from these poits to the edge of the separatig margi. These distaces are defied i the uits of half of the width of the separatig margi. For correctly classified poits withi the separatig margi, slack variable values are betwee zero ad oe. For misclassified poits withi the margi the values of the slack variables are betwee oe ad two. For other misclassified poits they are greater tha two. For poits that are lyig o the edge of margi, Lagrage multipliers are betwee zero ad C, ad slack variables for these poits are still equal to zero. For all other poits, for which the values of slack variables are larger tha zero, Lagrage multipliers assume the value of C. Explicit mappig to a very high-dimesioal space is ot required if calculatio of the scalar product i this high dimesioal space of every two vectors is feasible. This scalar product ca be defied by itroducig a kerel fuctio(x x ) ) K(x,x ), 24 where x ad x are vectors i a low-dimesioal space for which a kerel fuctio that correspods to a scalar product i a high dimesioal space is defied. Various kerels may be applied. 25 I our case, we used a kerel fuctio of a fifth-order polyomial: K(x,x ) ) ((x x )s + r) 5 This kerel correspods to the decisio fuctio f(x) ) sig( R i K(x sv i, x) + b) i where R i are Lagrage multipliers determied durig traiig of SVM. The sum is oly over support vectors x sv. Lagrage multipliers for all other poits are equal to zero. Parameter b determies the shift of the hyperplae, ad it is also foud durig SVM traiig. Simultaeous scalig of s, r, ad b parameters does ot chage the decisio fuctio. Thus, we ca simplify the kerel by settig r equal to oe: K(x,x ) ) ((x x )s + 1) 5 I this case oly the kerel parameter s ad error tradeoff C must be tued. Parameter C is ot preset explicitly i this equatio; it is set up as a pealty for the misclassificatio error before the traiig of SVM is performed. For tuig parameters s ad C, four-times cross-validatio of traiig data was applied, ad values for s ad C that maximize accuracy were the chose. Accuracy maximizatio was performed by heuristics based gradiet descet. 26 Basically, the followig procedure was applied. The data set was divided ito two parts, traiig ad validatio set. The validatio subset was put aside ad used oly for estimatio of the performace of the traied classifier. Traiig data were divided ito four ooverlappig subsets. The SVM parameters to be determied were set to reasoable iitial values. The, the SVM was traied o the traiig data Figure 2. Architecture of artificial eural etworks. Formal euros are draw as circles, weights are represeted by lies coectig the euro layers. Fa-out euros are draw i white, sigmoidal uits i black, ad liear uits i gray. (a) covetioal three-layered feed-forward system ( architecture I ); (b) etwork architecture used by Ajay ad co-workers for drug-likeess predictio ( architecture II ). 6 excludig oe of the four subsets, ad the performace of the obtaied SVM classifier was estimated with the excluded subset. This procedure was repeated for each subset, ad a average performace of the SVM classifier was obtaied. For SVM traiig we used freely available SVM software (SVM-Light package; URL: org/). 26,27 A Liux-based LSF (Load Sharig Facility; Platform Computig GmbH, D Ratige, Germay) cluster was used for determiatio of the cross-validatio error to reduce calculatio time. All calculatios were performed usig the MATLAB package (MATLAB 2002, The mathematical laboratory. The MathWorks GmbH, D Aache, Germay). ARTIFICIAL NEURAL NETWORK Covetioal two-layered eural etworks with a sigle output euro were used for ANN model developmet (Figure 2a). 26 As a result of etwork traiig a decisio fuctio is chose from the family of fuctios represeted by the etwork architecture. This fuctio family is defied by the complexity of the eural etwork: umber of hidde layers, umber of euros i these layers, ad topology of the etwork. The decisio fuctio is determied by choosig appropriate weights for the eural etwork. Optimal weights usually miimize a error fuctio for the particular etwork architecture. The error fuctio describes the deviatio of predicted target values from observed or desired values. For our class/oclass classificatio problem the target values were 1 for class (drugs) ad -1 for oclass (odrugs). Stadard two-layered eural etwork with a sigle output euro ca be represeted by the followig equatio y ) g ( M w 1j j)1 d w ji (2) g( (1) x i + w (1) j0 ) + w 11 with the error fuctio E ) k)1 (y(x k ) - y k ) 2. I this work, g is a liear fuctio ad g is a ta-sigmoid trasfer fuctio. A secod type etwork architecture cotaiig additioal coectios from the iput layer to the output layer was traied to reimplemet the origial drug/odrug ANN developed by Ajay ad co-workers (Figure 2b). 6 Traiig of eural etwork is typically performed o variatios of gradiet descet based algorithms, 26 tryig to (2) )

4 Table 1. Cross-Validated Results of Machie Learig a % correct Matthews cc ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, descriptors ANN SVM ANN SVM GC ( ( ( ( MOE ( ( ( ( CATS_ ( ( ( ( all (GC+MOE+CATS) ( ( ( ( a Average values ad stadard deviatios are give. The Leveberg-Marquardt traiig method was used for ANN traiig. miimize a error fuctio. To avoid overfittig crossvalidatio ca be used for fidig a earlier poit of traiig. 28 I this work the eural etwork toolbox from MATLAB was used. Data were preprocessed idetically to SVM based learig. We applied the followig traiig algorithms to ANN optimizatio i their default versios provided by MATLAB: gradiet descet with variable learig rate, 29,30 cojugated gradiet descet, 30,31 scaled cojugated gradiet descet, 32 quasi-newto algorithm, 33 Leveberg-Marquardt (LM), 34,35 ad automated regularizatio. 36 For each optimizatio te-times cross-validatio was performed (80+20 splits ito traiig ad test data), where the ANN weights ad biases were optimized usig the traiig data, ad predictio accuracy was measured usig test data to determie the umber of traiig epochs, i.e., the edpoit of the traiig process. This was performed to reduce the risk of overfittig. It should be oted that the validatio data were left utouched. MODEL VALIDATION The SVM model for drug/odrug classificatio of a patter x was SVM(x) ) (a i K(x SV i, x) + b) i Here, i rus oly over support vectors (SV). The value of SVM(x) is either positive ( drug ) or egative ( odrug ). The ANN model for drug/odrug classificatio produced values i ]-1,1[, where a positive value meat drug ad a egative value odrug. Classificatio accuracy was evaluated based o predictio accuracy, i.e., percet of test compouds correctly classified, ad the correlatio coefficiet accordig to Matthews: 37 NP - OU cc ) (N + O)(N + U)(P + O)(P + U) where P, N, O, ad U are the umber of true positive, true egative, false positive, ad false egative predictios, respectively. Drugs were cosidered as positive set, the odrug molecules formed the egative set. The values of cc ca rage from -1 to 1. Perfect predictio gives a correlatio coefficiet of 1. SVM ad ANN models were developed usig various sizes of traiig data to measure the ifluece of the size of the traiig set o the quality of the classificatio model. The umber of traiig samples was iteratively dimiished: Startig with a radom split of all available samples ito traiig ad validatio subsets, at each of the followig iteratios we dimiished the size of the traiig set to oly 80% of the umber of samples of the previous iteratio. This allowed us to obtai better samplig for small traiig sets. 10-times cross-validatio was performed, ad average values of predictio accuracy ad cc were calculated. RESULTS AND DISCUSSION The mai aim of this study was to compare SVM ad ANN classifiers i their ability to distiguish betwee sets of drugs ad odrugs. We traied differet eural etwork topologies, ad performace of the best etwork was compared to the SVM classifier. Two types of ANN architecture were cosidered: stadard feed-forward etworks with oe hidde layer ( architecture I ) ad a feed-forward etwork with oe hidde layer with additioal direct coectios from iput euros to the output ( architecture II ) (Figure 2). The first type of ANN was used by Sadowski ad Kubiyi i their origial work o drug-likeess predictio; 7 the secod architecture was employed by Ajay ad co-workers servig the same purpose. 6 Usig these etworks ad the GC descriptors i combiatio with the Leveberg-Marquardt traiig method, classificatio accuracy was idetical to the origial results (o average 80% correct) despite the use of a differet traiig techique ad differet traiig data (Table 1). This observatio substatiates the origial fidigs. Both etwork types performed idetically cosiderig the error margi (approximately 80% correct classificatio). We observed that for some of the traiig algorithms a slightly lower stadard deviatio of the predictio accuracy was observed for architecture I (data ot show). Sice the additioal coectios i etwork architecture II did ot cotribute to a greater accuracy of the model, we used oly the stadard feed-forward etwork with oe hidde layer cotaiig two euros (architecture I) for further aalysis. For each traiig method ad combiatio of iput variables (descriptors) etworks with differet umbers of hidde euros (2-10 euros) were traied. Overall, we did ot observe a overall best traiig algorithm. The Leveberg-Marquardt method was used for the developmet of the fial ANN model. Also, we did ot observe a improved classificatio result whe the umber of hidde euros was larger tha two (data ot show). ANN architecture I with two hidde euros yielded the overall best cross-validated predictio result for all descriptors (GC+MOE+CATS), 80% correct predictios ( cc ) 0.58). The rak order of descriptor sets with regard to the overall classificatio accuracy yielded was as follows: All > GC > MOE > CATS (Table 1). It should be stressed that the differeces i classificatio accuracy are miute for the descriptors All, MOE, ad GC ad should be regarded as comparable cosiderig a stadard deviatio of 1%. The CATS descriptor led to approximately 5% lower accuracy.

5 1886 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. Figure 3. Average cross-validated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for GC descriptors (upper graph: logarithmic scale; lower graph: liear scale). SVM traiig resulted i models showig slightly higher predictio accuracy tha the ANN systems (Table 1). A 1-2% gai was observed, idepedet of the umber of traiig samples ad method used for eural etwork traiig. Figures 3 ad 4 illustrate the depedecy of the classificatio accuracy o the umber of sample molecules used for traiig. I oe experimet oly GC descriptors were used (Figure 3), i a secod study the combiatio of GC, MOE, ad CATS descriptors was employed (Figure 4). With the GC descriptor the SVM estimator oly slightly outperforms the eural etworks (Figure 3). Similar results were obtaied if oly MOE or CATS descriptors were used for traiig (data ot show). The situatio chaged whe all descriptors were used. With the complete descriptor set (525-dimesioal) SVM clearly outperforms the eural etwork system (Figure 4). These results substatiate earlier fidigs that SVM performs better tha ANN whe large umbers of features or descriptors are used. 12 A geeral observatio was the fact that classificatio accuracy sigificatly improved with a icreasig umber of traiig samples, reachig a plateau i performace betwee 2000 ad 3000 samples (Figures 3 ad 4). The accuracy curves represet almost ideal learig behavior. It should be metioed that the performace plateau observed does ot reflect a iheret clusterig of the data set, as traiig data subsets were radomly selected from the pool. The fractio correctly predicted grows from approximately 65% to 80% whe the traiig set is icreased by a factor of 250. The combiatio of MOE, GC, ad CATS descriptors improved classificatio accuracy by approximately two percet for SVM ad by oe percet for ANN compared to models based o idividual descriptors. These results demostrate that a optimal ANN traiig to a large extet depeds o the umber of traiig patters available ad the type of molecular descriptors used. For istace, for GC descriptors the best learig algorithm was traiig with

6 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, Figure 4. Average cross-validated predictio accuracy (fractio correct) of SVM ad ANN classifiers optimized by various traiig schemes for the combiatio of GC, MOE, ad CATS descriptors (upper graph: logarithmic scale; lower graph: liear scale). automated regularizatio, but for the combiatio of GC, MOE, ad CATS descriptors this algorithm was extremely slow ad coverged relatively ustable. I cotrast, SVM geerally performed more stably compared to ANN, with oly a small icrease i computatio time for both sets of descriptors (Figures 3 ad 4). I a previous compariso of SVM to several machie learig methods by Holde ad co-workers it was show that a SVM classifier outperformed other stadard methods, but a specially desiged ad structurally optimized eural etwork was agai superior to the SVM model i a bechmark test. 13 This observatio is supported by the observatio that i the preset study the set of molecules which were correctly classified by both SVM ad ANN (mutual true positives) was 72% o average, ad the fractio icorrectly classified by both systems (mutual false egatives) was 11%. 10% of the test data were correctly predicted by SVM but failed by ANN, ad 6% were correctly classified by ANN but ot by SVM usig the full set of descriptors (GC+MOE+CATS). Examples of the latter two sets of molecules are show i Figure 5. Clearly, the ANN classifier ad the SVM classifier complemet each other, ad both methods could be further optimized, for example, by chagig the SVM kerel or by explorig more sophisticated ANN architectures ad cocepts. Fast classifier systems are maily developed for first-pass virtual screeig, i particular for idetificatio ( flaggig ) of potetially udesired molecules i very large compoud collectios. 2 Due to robust covergece behavior SVM seems to be well-suited for solvig biary decisio problems i molecular iformatics, especially whe a large umber of descriptors is available for characterizatio of molecules. I this study we have show that two drug-likeess estimators ca produce complemetary predictios. We recommed the parallel applicatio of both predictive systems for virtual screeig applicatios. Oe possibility to combie several estimators for drug-likeess or ay other classificatio task is to employ a jury decisio, e.g. calculate a esemble

7 1888 J. Chem. If. Comput. Sci., Vol. 43, No. 6, 2003 BYVATOV ET AL. determies the success or failure of machie learig systems. Both methods are suited to assess the usefuless of differet descriptor sets for a give classificatio task, ad they are methods of choice for rapid first-pass filterig of compoud libraries. 40 A particular advatage of SVM is sparseess of the solutio. This meas that a SVM classifier depeds oly o the support vectors, ad the classifier fuctio is ot iflueced by the whole data set, as it is the case for may eural etwork systems. Aother characteristic of SVM is the possibility to efficietly deal with a very large umber of features due to the exploitatio of kerel fuctios, which makes it a attractive techique, e.g., for gee chip aalysis or high-dimesioal chemical spaces. The combiatio of SVM with a feature selectio routie might provide a efficiet tool for extractig chemically relevat iformatio. Figure 5. Examples of drugs correctly classified by ANN but ot by SVM (structures 1-5), ad drugs correctly classified by SVM but ot by ANN (structures 6-10). average. 38,39 As more ad more differet predictors become available for virtual screeig a meaigful combiatio of predictio systems that exploits the idividual stregths of the differet methods will be pivotal for reliable compoud library filterig. CONCLUSION It was demostrated that the SVM system used i this study has the capacity to produce higher overall predictio accuracy tha a particular ANN architecture. Based o this observatio we coclude that SVM represets a useful method for classificatio tasks i QSAR modelig ad virtual screeig, especially whe large umbers of iput variables are used. The SVM classifier was show to complemet the predictios obtaied by ANN. The SVM ad ANN classifiers obtaied for drug-likeess predictio are comparable i overall accuracy ad produce overlappig, yet ot idetical sets of correctly ad misclassified compouds. A similar observatio ca be made whe two ANN models are compared. Differet ANN architectures ad traiig algorithms were show to lead to differet classificatio results. Therefore, it might be wise to apply several predictive models i parallel, irrespective of their ature, i.e., beig SVM- or ANN-based. We wish to stress that our study does ot justify the coclusio that SVM outperforms ANN i geeral. I the preset work oly a stadard feed-forward etwork with a fixed umber of hidde euros was compared to a stadard SVM implemetatio. Nevertheless, our results idicate that solutios obtaied by SVM traiig seem to be more robust with a smaller stadard error compared to stadard ANN traiig. Irrespective of the outcome of this study, it is the appropriate choice of traiig data ad descriptors, ad reasoable scalig of iput variables that ACKNOWLEDGMENT The authors are grateful to Norbert Dichter ad Ralf Tomczak for settig up the LSF Liux cluster. Alireza Givehchi is thaked for assistace i istallig the gecco! Web iterface. This work was supported by the Beilstei- Istitut zur Förderug der Chemische Wisseschafte, Frakfurt. REFERENCES AND NOTES (1) Clark, D. E.; Pickett, S, D. Computatioal methods for the predictio of drug-likeess. Drug DiscoV. Today 2000, 5, (2) Scheider, G.; Böhm, H.-J. Virtual screeig ad fast automated dockig methods. Drug DiscoV. Today 2002, 7, (3) Wold, S. Expoetially weighted movig pricipal compoet aalysis ad projectios to latet structures. Chemomet. Itell. Lab. Syst. 1994, 23, (4) Foria, M.; Casolio, M. C.; de la Pezuela Martiez, C. Multivariate calibratio: applicatios to pharmaceutical aalysis. J. Pharm. Biomed. Aal. 1998, 18, (5) Neural Networks i QSAR ad Drug Desig; Devillers, J., Ed.; Academic Press: Lodo, (6) Ajay; Walters, W. P.; Murcko, M. A. Ca we lear to distiguish betwee drug-like ad odrug-like molecules? J. Med. Chem. 1998, 41, (7) Sadowski, J.; Kubiyi, H. A scorig scheme for discrimiatig betwee drugs ad odrugs. J. Med. Chem. 1998, 41, (8) Sadowski, J. Optimizatio of chemical libraries by eural etworks. Curr. Opi. Chem. Biol. 2000, 4, (9) Scheider, G. Neural etworks are useful tools for drug desig. Neural Networks 2000, 13, (10) Sadowski, J. I Virtual Screeig for BioactiVe Molecules; Böhm, H.-J., Scheider, G., Eds.; Weiheim: Wiley-VCH: 2000; pp (11) Cortes, C.; Vapik, V. Support-vector etworks. Machie Learig 1995, 20, (12) Vapik, V. The Nature of Statistical Learig Theory; Berli: Spriger, (13) Burbidge, R.; Trotter, M.; Buxto, B.; Holde, S. Drug desig by machie learig: support vector machies for pharmaceutical data aalysis. Comput. Chem. 2001, 26, (14) Warmuth, M. K.; Liao, J.; Ratsch, G.; Mathieso, M.; Putta, S.; Lemme, C. Active learig with Support Vector Machies i the drug discovery process. J. Chem. If. Comput. Sci. 2003, 43, (15) Wilto, D.; Willett, P.; Lawso, K.; Mullier, G. Compariso of rakig methods for virtual screeig i lead-discovery programs. J. Chem. If. Comput. Sci. 2003, 43, (16) Todeschii, R.; Cosoi, V. Hadbook of Molecular Descriptors; Weiheim: Wiley-VCH: (17) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity relatioships 1. Partitio coefficiets as a Measure of hydrophobicity. J. Comput. Chem. 1986, 7, (18) Ghose, A. K.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity

8 ARTIFICIAL NEURAL NETWORK SYSTEMS J. Chem. If. Comput. Sci., Vol. 43, No. 6, relatioships 2. Modelig dispersive ad hydrophobic iteractios. J. Comput. Chem. 1987, 27, (19) Ghose, A. K.; Pritchett, A.; Crippe, G. M. Atomic physicochemical parameters for three-dimesioal structure-directed quatitative structure-activity relatioships 3. J. Comput. Chem. 1988, 9, (20) Scheider, G.; Neidhart, W.; Giller, T.; Schmid, G. Scaffold-hoppig by topological pharmacophore search: a cotributio to virtual screeig. Agew. Chem., It. Ed. Egl. 1999, 38, (21) Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic geeratio of 3Datomic coordiates for orgaic molecules. Tetrahedro Comput. Methods 1990, 3, (22) Colema, T. F.; Li, Y. A reflective Newto method for miimizig a quadratic fuctio subject to bouds o some of the variables. SIAM J. Optimizatio 1996, 6, (23) Joachims, T. I Makig large-scale SVM learig practical. AdVaces i Kerel Methods - Support Vector Learig; Schölkopf, B., Burges, C., Smola, A., Eds.; MIT-Press: Cambridge, MA, 1999; pp (24) Cristiaii, N.; Shawe-Taylor, J. A Itroductio to Support Vector Machies ad Other Kerel-based Learig Methods; Cambridge Uiversity Press: Cambridge, (25) Burges, C. J. C. A tutorial o support vector machies for patter recogitio. Data Miig Kowledge DiscoVery 1998, 2, (26) Bishop, C. M. Neural Networks for Patter Recogitio; Oxford: Oxford Uiversity Press: (27) Joachims, T. Learig to classify text usig Support Vector Machies. Kluwer Iteratioal Series i Egieerig ad Computer Sciece 668; Kluwer Academic Publishers: Bosto, (28) Duda, R. O.; Hart, P. E.; Stork, D. G. Patter Classificatio; Wiley- Itersciece: New York, (29) Rumelhart, D. E.; McClellad, J. L.; The PDB Research Group. Parallel Distributed Processig; MIT Press: Cambridge, MA, (30) Haga, M. T.; Demuth, H. B.; Beale, M. H. Neural Network Desig; PWS Publishig: Bosto, (31) Fletcher, R.; Reeves, C. M. Fuctio miimizatio by cojugate gradiets. Comput. J. 1964, 7, (32) Moller, M. F. A scaled cojugate gradiet algorithm for fast supervised learig. Neural Networks 1993, 6, (33) Deis, J. E.; Schabel, R. B. Numerical Methods for Ucostraied Optimizatio ad Noliear Equatios; Pretice-Hall: Eglewood Cliffs, (34) Haga, M. T.; Mehaj, M. Traiig feedforward etworks with the Marquardt algorithm. IEEE Tras. Neural Networks 1994, 5, (35) Foresee, F. D.; Haga, M. T. Gauss-Newto approximatio to Bayesia regularizatio. Proceedigs of the 1997 Iteratioal Joit Coferece o Neural Networks; pp (36) MacKay, D. J. C. Bayesia iterpolatio. Neural Comput. 1992, 4, (37) Matthews, B. W. Compariso of the predicted ad observed secodary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, (38) Krogh, A.; Sollich, P. Statistical mechaics of esemble learig. Phys. ReV. E1997, 55, (39) Baldi, P.; Bruak, S. Bioiformatics - The Machie Learig Approach; MIT Press: Cambridge, (40) Byvatov, E.; Scheider, G. Support vector machie applicatios i bioiformatics. Appl. Bioif. 2003, 2, CI

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Review: Classification Outline

Review: Classification Outline Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

Linear classifier MAXIMUM ENTROPY. Linear regression. Logistic regression 11/3/11. f 1

Linear classifier MAXIMUM ENTROPY. Linear regression. Logistic regression 11/3/11. f 1 Liear classifier A liear classifier predicts the label based o a weighted, liear combiatio of the features predictio = w 0 + w 1 f 1 + w 2 f 2 +...+ w m f m For two classes, a liear classifier ca be viewed

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Generalization Dynamics in LMS Trained Linear Networks

Generalization Dynamics in LMS Trained Linear Networks Geeralizatio Dyamics i LMS Traied Liear Networks Yves Chauvi Psychology Departmet Staford Uiversity Staford, CA 94305 Abstract For a simple liear case, a mathematical aalysis of the traiig ad geeralizatio

More information

Cantilever Beam Experiment

Cantilever Beam Experiment Mechaical Egieerig Departmet Uiversity of Massachusetts Lowell Catilever Beam Experimet Backgroud A disk drive maufacturer is redesigig several disk drive armature mechaisms. This is the result of evaluatio

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Spam Detection. A Bayesian approach to filtering spam

Spam Detection. A Bayesian approach to filtering spam Spam Detectio A Bayesia approach to filterig spam Kual Mehrotra Shailedra Watave Abstract The ever icreasig meace of spam is brigig dow productivity. More tha 70% of the email messages are spam, ad it

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

Application and research of fuzzy clustering analysis algorithm under micro-lecture English teaching mode

Application and research of fuzzy clustering analysis algorithm under micro-lecture English teaching mode SHS Web of Cofereces 25, shscof/20162501018 Applicatio ad research of fuzzy clusterig aalysis algorithm uder micro-lecture Eglish teachig mode Yig Shi, Wei Dog, Chuyi Lou & Ya Dig Qihuagdao Istitute of

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach

Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach Scalable Biomedical Named Etity Recogitio: Ivestigatio of a Database-Supported SVM Approach Moa Solima Habib * ad Jugal Kalita Departmet of Computer Sciece Uiversity of Colorado, 1420 Austi Bluffs Pkwy

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC 8 th Iteratioal Coferece o DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a i a, M a y 25 27, 2 6 ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC Vadim MUKHIN 1, Elea PAVLENKO 2 Natioal Techical

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level A1 of challege: C A1 Mathematical goals Startig poits Materials required Time eeded Iterpretig algebraic expressios To help learers to: traslate betwee words, symbols, tables, ad area represetatios

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

Cutting-Plane Training of Structural SVMs

Cutting-Plane Training of Structural SVMs Cuttig-Plae Traiig of Structural SVMs Thorste Joachims, Thomas Filey, ad Chu-Nam Joh Yu Abstract Discrimiative traiig approaches like structural SVMs have show much promise for buildig highly complex ad

More information

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

HCL Dynamic Spiking Protocol

HCL Dynamic Spiking Protocol ELI LILLY AND COMPANY TIPPECANOE LABORATORIES LAFAYETTE, IN Revisio 2.0 TABLE OF CONTENTS REVISION HISTORY... 2. REVISION.0... 2.2 REVISION 2.0... 2 2 OVERVIEW... 3 3 DEFINITIONS... 5 4 EQUIPMENT... 7

More information

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System Evaluatio of Differet Fitess Fuctios for the Evolutioary Testig of a Autoomous Parkig System Joachim Wegeer 1, Oliver Bühler 2 1 DaimlerChrysler AG, Research ad Techology, Alt-Moabit 96 a, D-1559 Berli,

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015 CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

Overview on S-Box Design Principles

Overview on S-Box Design Principles Overview o S-Box Desig Priciples Debdeep Mukhopadhyay Assistat Professor Departmet of Computer Sciece ad Egieerig Idia Istitute of Techology Kharagpur INDIA -721302 What is a S-Box? S-Boxes are Boolea

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

JJMIE Jordan Journal of Mechanical and Industrial Engineering

JJMIE Jordan Journal of Mechanical and Industrial Engineering JJMIE Jorda Joural of Mechaical ad Idustrial Egieerig Volume 5, Number 5, Oct. 2011 ISSN 1995-6665 Pages 439-446 Modelig Stock Market Exchage Prices Usig Artificial Neural Network: A Study of Amma Stock

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

THE problem of fitting a circle to a collection of points

THE problem of fitting a circle to a collection of points IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.

More information

NATIONAL SENIOR CERTIFICATE GRADE 12

NATIONAL SENIOR CERTIFICATE GRADE 12 NATIONAL SENIOR CERTIFICATE GRADE MATHEMATICS P EXEMPLAR 04 MARKS: 50 TIME: 3 hours This questio paper cosists of 8 pages ad iformatio sheet. Please tur over Mathematics/P DBE/04 NSC Grade Eemplar INSTRUCTIONS

More information

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations 44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

1. Introduction. Scheduling Theory

1. Introduction. Scheduling Theory . Itroductio. Itroductio As a idepedet brach of Operatioal Research, Schedulig Theory appeared i the begiig of the 50s. I additio to computer systems ad maufacturig, schedulig theory ca be applied to may

More information

Recursion and Recurrences

Recursion and Recurrences Chapter 5 Recursio ad Recurreces 5.1 Growth Rates of Solutios to Recurreces Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer. Cosider, for example,

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Learning outcomes. Algorithms and Data Structures. Time Complexity Analysis. Time Complexity Analysis How fast is the algorithm? Prof. Dr.

Learning outcomes. Algorithms and Data Structures. Time Complexity Analysis. Time Complexity Analysis How fast is the algorithm? Prof. Dr. Algorithms ad Data Structures Algorithm efficiecy Learig outcomes Able to carry out simple asymptotic aalysisof algorithms Prof. Dr. Qi Xi 2 Time Complexity Aalysis How fast is the algorithm? Code the

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Gibbs Distribution in Quantum Statistics

Gibbs Distribution in Quantum Statistics Gibbs Distributio i Quatum Statistics Quatum Mechaics is much more complicated tha the Classical oe. To fully characterize a state of oe particle i Classical Mechaics we just eed to specify its radius

More information

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina Overcomig the Crisis: Ecoomic ad Fiacial Developmets i Asia ad Europe Edited by Štefa Bojec, Josef C. Brada, ad Masaaki Kuboiwa http://www.hippocampus.si/isbn/978-961-6832-32-8/cotets.pdf Volatility of

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

Modeling of Ship Propulsion Performance

Modeling of Ship Propulsion Performance odelig of Ship Propulsio Performace Bejami Pjedsted Pederse (FORCE Techology, Techical Uiversity of Demark) Ja Larse (Departmet of Iformatics ad athematical odelig, Techical Uiversity of Demark) Full scale

More information

Regularized Distance Metric Learning: Theory and Algorithm

Regularized Distance Metric Learning: Theory and Algorithm Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Effective Hybrid Intrusion Detection System: A Layered Approach

Effective Hybrid Intrusion Detection System: A Layered Approach I. J. Computer Network ad Iformatio Security, 2015, 3, 35-41 Published Olie February 2015 i MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcis.2015.03.05 Effective Hybrid Itrusio Detectio System: A Layered

More information

Algebra Vocabulary List (Definitions for Middle School Teachers)

Algebra Vocabulary List (Definitions for Middle School Teachers) Algebra Vocabulary List (Defiitios for Middle School Teachers) A Absolute Value Fuctio The absolute value of a real umber x, x is xifx 0 x = xifx < 0 http://www.math.tamu.edu/~stecher/171/f02/absolutevaluefuctio.pdf

More information

Problem Solving with Mathematical Software Packages 1

Problem Solving with Mathematical Software Packages 1 C H A P T E R 1 Problem Solvig with Mathematical Software Packages 1 1.1 EFFICIENT PROBLEM SOLVING THE OBJECTIVE OF THIS BOOK As a egieerig studet or professioal, you are almost always ivolved i umerical

More information

A gentle introduction to Expectation Maximization

A gentle introduction to Expectation Maximization A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio

More information