Heuristic Static Load-Balancing Algorithm Applied to the Fragment Molecular Orbital Method

Size: px
Start display at page:

Download "Heuristic Static Load-Balancing Algorithm Applied to the Fragment Molecular Orbital Method"

Transcription

1 Heurstc Statc Load-Balancng Algorthm Appled to the Fragment Molecular Orbtal Method Yur Alexeev*, Ashutosh Mahajan*, Sven Leyffer, Graham Fletcher Argonne Natonal Laboratory 9700 S. Cass Avenue Argonne, IL 60439, USA Dmtr G. Fedorov Natonal Insttute of Advanced Industral Scence and Technology Central 2, Umezono Tsukuba , Japan Abstract In the era of petascale supercomputng, the mportance of load balancng s crucal. Although dynamc load balancng s wdespread, t s ncreasngly dffcult to mplement effectvely wth thousands of processors or more, promptng a second look at statc load-balancng technques even though the optmal allocaton of tasks to processors s an NP-hard problem. We propose a heurstc statc load-balancng algorthm, employng ftted benchmarkng data, as an alternatve to dynamc load balancng. The problem of allocatng CPU cores to tasks s formulated as a mxed-nteger nonlnear optmzaton problem, whch s solved by usng an optmzaton solver. On 163,840 cores of Blue Gene/P, we acheved a parallel effcency of 80% for an executon of the fragment molecular orbtal method appled to model proten-lgand complexes quantummechancally. The obtaned allocaton s shown to outperform dynamc load balancng by at least a factor of 2, thus motvatng the use of ths approach on other coarse-graned applcatons. Keywords: Dynamc load balancng, statc load balancng, heurstc algorthm, quantum chemstry, GAMESS, fragment molecular orbtals, FMO, optmzaton, MINLP, proten-lgand complex I. INTRODUCTION Achevng an even load balance s a key ssue n parallel computng, and ncreasngly so as we enter the petascale supercomputng era. By Amdahl s law, the scalable component of the total wall tme shrnks as the numbers of processors ncreases, whle the load mbalance, together wth the constant sequental component, acts to retard the scalablty. Although parallelzaton of sequental code often requres rewrtng the code, adoptng an effcent loadbalancng scheme can be a smple and effectve way to boost scalablty and performance. Dynamc load balancng (DLB) and statc load balancng (SLB) are two broad classes of load-balancng algorthms. *YA and AM contrbuted equally to ths work Whereas SLB reles on prevously obtaned knowledge (for example benchmarkng data), or consstent task szes, DLB dynamcally assgns jobs to processors durng code executon. Many varatons on SLB and DLB algorthms adapted for specfc applcatons have been reported [1-4], usng dfferent technques such as random stealng [5, 6], smulated annealng [7], recursve bsecton methods [8-10], space-fllng curve parttonng [11-14], and graph parttonng [15-21]. SLB s usually smple to mplement and has neglgble overhead, makng t sutable for fne-graned parallelsm consstng of many small tasks. However, f the applcaton nvolves much larger tasks of dverse szes, as s often the case wth coarse-graned parallelsm, DLB may be preferred. Snce many applcatons naturally nvolve wdely dfferng task szes, DLB algorthms have become wdespread. Indeed, as the number of avalable processors ncreases (for nstance, when movng from a PC cluster envronment to a large modern supercomputer), many applcatons fnd t advantageous to allocate work n larger chunks n the nterest of reducng overhead. In the shft from fne- to coarse-graned parallelsm, DLB may seem to be the natural choce. However, the DLB schemes sutable for a PC cluster often perform poorly on many thousands of processors, promptng the search for load-balancng paradgms that can handle dverse task szes wth mnmal overhead. One possblty s to adapt SLB technques to pre-allocate tasks more effectvely by drawng on a deeper understandng of the applcaton at hand. However, the optmal statc mappng of jobs to more than two processors s, n general, an NP-hard problem [22, 23]. Nevertheless, such SLB methods have been successfully appled to a large number of applcatons [1]. The success of applyng SLB often reles on predctve models that can also depend on the accuracy of nput data from a benchmarkng study; both factors can be systematcally mproved. Furthermore, f the calculaton s teratve, the lack of a dynamc means of allocatng tasks can be accounted for n SLB schemes by redstrbutng work between teratons. SC12, November 10-16, 2012, Salt Lake Cty, Utah, USA /12/$ IEEE

2 In ths paper we examne parallel load-balancng schemes appled to a quantum chemstry method - the fragment molecular orbtal (FMO) method mplemented n the quantum chemstry code GAMESS [24, 25] - on the Blue Gene/P [26] supercomputer at Argonne Natonal Laboratory. Whle FMO has been shown before to acheve superor scalablty for fne-graned systems such as water clusters [27], we am to mprove the scalablty and effcency of coarse-graned systems, such as protens. We analyze why FMO s current DLB scheme s not optmal and propose an SLB alternatve. A key feature of our SLB method s the formulaton of a mxed-nteger nonlnear optmzaton (MINLP) problem to model the allocaton of processng cores to tasks. The MINLP approach provdes great flexblty n modelng the allocaton problem realstcally. Usng nonlnear functons, we can capture complex relatonshps between runnng tme and the number of processors. At the same tme, we can mpose nteger restrctons on certan varables (e.g., number of processors). The soluton to MINLP can then be drectly used for load balancng n the GAMESS applcaton. To solve the MINLP arsng n our procedure, we use MINOTAUR [28], a freely avalable MINLP toolkt. It offers several algorthms for solvng general MINLPs, and can be easly called from dfferent nterfaces. Our MINLP formulaton requres a few parameters to accurately model the performance, obtaned by collectng benchmarkng data about the applcaton and solvng a fttng problem. We descrbe these methods n Secton IV. Our experments demonstrate that both the fttng problem and the MINLP problem can be solved quckly on a sngle core, and the resultng allocatons lead to sgnfcant savngs n the run tme of the GAMESS applcaton. The DLB and SLB comparson s done on the receptorlgand system Aurora-A knase and nhbtor shown n Fg. 1 (A). We demonstrate the performance of our method on a large proten system (see Fg. 1 (C)) usng all 40 racks (163,840 cores) on Argonne s Blue Gene/P. II. FRAGMENT MOLECULAR ORBITAL METHOD Ab nto quantum chemstry methods are, n prncple, applcable to any molecular system though the computatonal cost ncreases steeply wth the system sze. Even the smplest restrcted Hartree-Fock (RHF) method scales approxmately cubcally wth the system sze. There are ongong efforts to reduce the scalng of quantum-mechancal (QM) methods [29, 30] and parallelze them effcently [31-38]. See, for example, the lnearly scalng method developed by Challacombe and Schwegler [39], and the adaptve multresoluton method developed by Harrson, et al. [40]. Alternatvely, fragment-based methods [41, 42] (whch dvde the system nto fragments) can dramatcally reduce computatonal cost, ncrease stablty of calculatons, and provde addtonal nformaton on propertes of fragments and ther nteractons. Algorthmcally, fragmentaton results n dvson of one large calculaton nto many small and nearly ndependent subtasks or loosely coupled ensemble calculatons. As a result, fragmentaton methods are effcent for performng quantum mechancal calculatons on supercomputers. One of the fragment-based methods s the FMO method [43], whch has been nterfaced wth many QM methods and successfully appled to chemcal systems such as protens, DNA, slcon nanowres, and onc lquds [44]. FMO has been mplemented n GAMESS [45] and parallelzed wth the generalzed dstrbuted data nterface (GDDI) [46-48]. In FMO, each fragment electronc state s computed n the potental exerted by all the others. Startng from an ntal guess, fragment calculatons that update the embeddng potental are terated untl self-consstency s acheved. Subsequently, fragment par calculatons are performed n the embeddng potental. The fragmentaton, whch s usually chemcally motvated for rapd convergence, fxes the parallel doman decomposton at the outset. The basc FMO equaton has the form F E E E E E, (1) 1 j (, j): 1... F, j1... F, j where F s the number of fragments and E, E j are the energes of fragment (monomer) and fragment par (dmer) j, respectvely. These energes are assembled accordng to Eq. (1) to gve the total energy and other propertes of the system. GDDI s a two-level parallelzaton scheme, whch can be thought of as coarse-graned parallelsm snce all CPU cores are dvded nto a few groups. At the hgher ntergroup level the load balancng s accomplshed by assgnng fragments or fragment pars to GDDI groups. At the lower ntragroup level, the load balancng s accomplshed by assgnng some ntegral workload to ndvdual CPU cores wthn a group. Varous mplementatons of GDDI exst, of whch the man ones are (1) UNIX socket-based, whereby each CPU core runs a GAMESS process and communcates over TCP/IP va sockets, and (2) MPI-based, where MPI communcators are created for groups. Ths two-level parallelzaton has been successful n obtanng up to about 90% of the perfect scalablty (.e., 90-fold speedup on a 100-fold ncrease n the number of cores) [48] on PC clusters wth 128 CPUs connected by a low-end network (FastEthernet). FMO/GDDI has subsequently been used on larger computer systems such as the AIST supercluster [49]. More recently, FMO/GDDI has been successfully run for large water clusters on 131,072 CPU cores on Argonne s Blue Gene/P [27]. Our current MPI-based mplementaton on Blue Gene/P comprses compute- and data-server process pars, so that half of all CPU cores are used for QM calculatons, whle the other half handle communcatons and dstrbuted memory processng. We reported wall-clock tmngs for GAMESS runs on the total number of CPU cores. In ths paper we apply FMO to two proten-lgand systems. All benchmarkng and tunng of both DLB and HSLB schemes have been done on Aurora-A knase wth nhbtor phthalaznone shown on Fg. 1 (A). Aurora knases j

3 are essental for cell prolferaton and a major target n desgnng new ant-cancer drugs. The system s of moderate sze: 155 fragments (154 amno acds and 1 lgand), wth the total number of atoms equal to 2,604, computed at the RHF level of theory wth the 6-31G* bass set. The producton run was done for ovne COX-1 complexed wth buprofen shown on Fg. 1 (B). The system conssts of 17,767 atoms dvded nto 1,093 fragments. For ths work, we used the dstrbuted memory storage of fragment denstes [27]. Varous tasks, ncludng the fragmentaton of protens, structure checkng, the generaton of GAMESS nput for FMO calculatons, and the vsualzaton of results, were performed by the FMOtools sute of Python programs [50]. hgh power of the system sze. For example, RHF scales as N 3 and coupled-cluster wth perturbatve trples (CCSD(T)) scales as N 7. The electronc state of some fragments can be frozen [51]. Stll more varaton n task szes can arse from havng dfferent levels of theory and bass set for dfferent regons of the system, as n the multlayer FMO method [52]. As the methodology of FMO becomes ncreasngly sophstcated, the tme to soluton and the scalablty of ndvdual fragment calculatons become harder to model. An example of the varaton n scalablty of fragment calculatons as a functon of sze s shown n Fg. 2. Fgure 1: (A) A schematc vew of the structure of Aurora-A knase complexed wth nhbtor phthalaznone n cyan color (PDB code: 3P9J). (B) ball-and-stck representaton of nhbtor. The system conssts of 2,604 atoms dvded nto 155 fragments. (C) a schematc vew of the structure of prostaglandn H(2) synthase-1 (COX-1) n a complex wth buprofen n cyan color (PDB code: 1EQG). (D) ball-and-stck representaton of buprofen. The system conssts of 17,767 atoms dvded nto 1,093 fragments. III. LOAD BALANCING IN FRAGMENT MOLECULAR ORBITAL METHOD In the FMO method, a system s frst subdvded nto fragments. The protens consdered n ths paper are dvded naturally nto amno acd resdue fragments (at the C atoms) usng the FMOgen tool n the FMOtools package [50]). We assumed that ther standard protonaton state les at ph 7. The key ssue for load balancng s that amno acd resdues vary n sze from the smallest wth 7 atoms (Glycne) to the largest wth 24 atoms (Tryptophan). The accuracy of FMO [44] s determned by the fragmentaton, and hence fragments should not be very small. In other words, the number and the sze of fragments are determned by the underlyng chemstry and should not be modfed merely to mprove the effcency of parallelzaton. The number of fragments and ther szes are therefore consdered fxed for the purposes of parallelzng the calculatons. The sze of a fragment greatly affects quantum chemstry calculatons because the calculaton cost tends to scale as a Fgure 2: Scalablty of FMO fragments n Aurora-A knase and nhbtor system on Blue Gene/P. The smallest fragment (Gly amno acd resdue) and the largest fragment (nhbtor) are represented by and, respectvely. The data ponts were ftted and performance models for each fragment are shown. The cores represent the computatonal processes n GDDI. The scale of the x-axs s logarthmc. A prmary factor n the cost of quantum chemcal calculatons s the number of bass-functons (whch s roughly proportonal to the number of atoms). For FMO specfcally (assumng the smple RHF level), mportant factors also nclude (1) the number of self-consstent feld (SCF) teratons needed to acheve convergence; (2) the fragment packng densty, namely, the number of fragments close to a gven fragment, whch strongly affects the computatonal tme for the embeddng potental; and (3) the fragment packng densty. The latter has a large mpact on dmer calculatons owng to the use of electrostatc approxmatons (descrbed elsewhere [44]). Furthermore, factors (1)-(3) strongly nteract. For nstance, the fragment packng densty affects the SCF convergence, whch, n turn, also depends on the charge, spn state, and the ntal guess of the electron densty. In addton, the scalng and parallel effcency of the code are a complex functon of these factors; for nstance, the relatve fracton of the number of sequental steps, such as matrx dagonalzatons, s strongly affected by the choce of SCF convergence method, as well as by the number of SCF teratons (because the embeddng potental s computed once before SCF). All these factors

4 make the modelng of the functonal dependence of the tmng upon the fragment sze a formdable task. Once a system s splt nto fragments FMO calculatons can be performed by usng the algorthm shown n Fg. 3. A detaled dscusson of the algorthm s gven elsewhere [48]. Here, we descrbe t brefly. At the coarse-graned DLB level, fragments are assgned to groups of CPU cores. In conjuncton wth MPI, GDDI can generate processor groups as shown n Fg. 3, lne 3 (MPI_COMM_SPLIT functon). Currently, the default opton n GAMESS creates processor subgroups of unform sze. Each group performs sngle-pont fragment calculatons, assgned dynamcally (see Fg. 3, lne 7). Throughout ths paper the theory s RHF. The output of such an RHF calculaton s the fragment densty n the Coulomb feld of all fragments (Fg. 3, lne 10). Snce the new densty changes the feld, the process must be repeated untl self-consstency s acheved. Ths process nvolves exchange of fragment denstes among the groups by puttng generated denstes n DDI global array (DDI_put, Fg. 3, lne 12). The fragment denstes are accessed va DDI_get nsde SCF() and SCF(,j) n order to compute the embeddng potental, lnes 10 and 23, respectvely. The teratve process s sometmes referred to as the selfconsstent charge (SCC) or monomer SCF step, correspondng to the frst term of the energy expanson n equaton (1) wth RHF theory. In the fnal step (Fg. 3, lnes 17-26), fragment monomer denstes are used to construct dmers from all pars of monomers consttutng a second round of larger RHF calculatons. However, the dmer step s not terated to self-consstency wth respect to the embeddng potental. // Intalze varables 1: number_of_fragments=nput(); 2: number_of_groups=number_of_fragments/3; 3: DDI_group=DDI_group_create(number_of_groups,DDI_world); // Monomer loop 4: do { 5: for (=1; <number_of_fragments; ++) { 6: DDI_scope(DDI_world); 7: mytask=dynamc_load_balancng(ddi_world); 8: f (mytask==) { 9: DDI_scope(DDI_group); 10: fragment_densty()=scf(); 11: DDI_scope(DDI_world); 12: DDI_put(fragment_densty[]); 13: } 14: } 15: DDI_sync(DDI_world); 16: } whle (fragment_densty[]!=converged); // Dmer loop 17: for (=1; <number_of_fragments; ++) { 18: for (j=1; j<; j++) { 19: DDI_scope(DDI_world); 20: mytask=dynamc_load_balancng(ddi_world); 21: f (mytask==,j) { 22: DDI_scope(DDI_group); 23: two_fragment_densty(,j)=scf(,j); 24: } 25: } 26: } Fgure 3: Pseudo-code of FMO calculatons for dynamc load balancng. For FMO, three types of load balancng have been attempted pror to ths work, and we suggest an effcent modfcaton of one of them, the statc load balancng [48]. The alternatve to SLB s DLB; n addton, there s semdynamc load balancng (SDLB) [49]. In DLB, an effcent means to mprove effcency s the large-jobs-frst strategy [48]. Ths strategy consderably reduces the synchronzaton lag at the end of calculatons because the smallest tasks are done last. DLB, n our experence, performs satsfactorly when the rato of the total number of cores to the number of fragments s not very hgh (roughly 16 for our case, but t may vary consderably). Usng ths rato and recallng that the number of fragments s fxed, DLB may be appled on a proten wth 400 resdues wth good results on up to roughly 6,400 CPU cores. Addng more cores may result n a deteroraton of the performance. The parallelzaton effcency may drop because the calculatons of small fragments cannot be effcently parallelzed on a large number of cores allocated under the equal parttonng scheme of DLB. An mprovement of ths DLB problem has been acheved wth SDLB, n whch a handful of the largest fragment calculatons are performed usng SLB, whle the rest are done wth DLB (after the CPU cores partcpatng n SLB fnsh, they also jon n the DLB calculatons). However, such a strategy s useful manly n cases where there are only a few large fragments and the total number of CPU cores s not hgh; otherwse the problems mentoned above cannot be avoded, an effcent soluton s gven by the heurstc statc load balancng (HSLB) method proposed n Secton IV. The man dea behnd HSLB s to customze GDDI group szed to the fragment szes. Snce we solve an optmzaton problem heurstcally, t can easly adapt to handle dfferent numbers of CPUs and fragments. The number of processor groups used n FMO calculatons can vary from one to the number of fragments. Fg. 4 depcts the mpact of the group count on the scalablty of FMO for a sngle SCC teraton of a system wth 155 fragments. In the case of a sngle GDDI group, each fragment calculaton s executed on all CPU cores. Clearly, all but the largest fragments utlze the large processor count neffcently, and the overall calculaton has a low scalablty. On the other hand, the 155-group calculaton, n whch there s a group for every fragment, exhbts mproved scalablty. The current default choce assumes three fragments per group, yeldng 52 groups n ths system. The dfference n scalablty and wall clock tme for dfferent group counts s explaned n Fgs 5 and 6. Whle the synchronzaton tme shown s averaged over all GDDI groups, the effcency s computed for each fragment separately and then averaged over all fragments. Thus, the effcency, W, of fragment, as a functon of the number of CPU cores, s computed as n0 / T n, T W n (2) n / n 0 where n0 s the reference value of the number of CPU cores ( n 0 =2 and T n 0 was obtaned by extrapolaton), n s the actual number of CPU cores, and T n 0 and T n are the wall clock tmes to compute the energy of fragment n FMO on n 0 and n CPU cores, respectvely.

5 The data n Fg. 5 and Fg. 6 can be used to explan why the optmum group count wth DLB s between 1 and 155. For example, the synchronzaton tme tends to ncrease wth the group count, startng at zero seconds n the case of a sngle group. However, computatonal effcency also tends to ncrease wth the group count as smaller groups encounter lower parallel overheads. Therefore, an optmal group-count can be obtaned only by fndng the rght balance between the tme spent n synchronzaton and that ganed by parallelsm. In addton, we must ensure that the varance n tme taken by dfferent fragments n mnmzed. These tmes n turn depend on hardware characterstcs: the number of cores, CPU type, and the network type of the system. Fgure 6: Parallel effcency averaged over fragments durng the frst FMO SCC teraton for dfferent load balancng schemes on Blue Gene/P. The dataset s for Aurora-A knase and nhbtor system. Fgure 4: Wall-clock tme to fnsh a sngle FMO SCC teraton wth dfferent load balancng schemes. The dataset s for Aurora-A knase and nhbtor system. The calculatons are done at the RHF-D level of theory and 6-31G* bass set on Blue Gene/P. The scale of the y-axs s logarthmc. Fgure 5: Average synchronzaton tme among fragments accumulated durng the frst FMO SCC teraton. For DLB wth one group, the synchronzaton tme s equal to 0 seconds but because of the log scale t s shown as 1 second. The scale of the y-axs s logarthmc. IV. HEURISTIC STATIC LOAD-BALANCING ALGORITHM Our heurstc statc load-balancng method conssts of four steps. Frst, we collect benchmarkng data related to the compute tme of fragments. Second, we solve for the optmal parameters by a least-squares method based on our chosen scalablty model. Thrd, we solve an nteger optmzaton problem n order to obtan an optmal allocaton of cores. Fourth, we allocate the optmal number of cores obtaned from the optmzaton to run FMO n statc load-balancng mode. Wth a sutable model for the compute tme, one can apply ths four-step procedure to any other coarse-graned applcaton. Before descrbng each of these steps for our applcaton, we lst n Table I the notaton used to denote varables and parameters n our models. Table I. Lst of varables and parameters used n models descrbed n Secton IV. Symbol F N Descrpton Set of postve real numbers. Total number of tasks (fragments) among whch we want to allocate avalable cores. Total number of cores avalable for allocaton. n Number of cores allocated for processng task-. T ( n ) Performance functon that models the tme taken to process task- by usng n number of cores. scal T n ) Scalable component of the functon T ). ( ( n seral T n ) Seral component of the functon T ). ( ( n

6 nonln T n ) Component of the functon T ( n ) other than D ( a, scal seral T ( n ) and T ( n ). Total number of data ponts avalable for creatng the performance functon model for fragment., b, c d Parameters assocated wth the performance functon, T ( n ), of task-. Wall-clock tme obtaned from solvng the allocaton problem. j j run of fragment, j 1,..., D, n the benchmarkng stage. j run of fragment j, j 1,..., D, n the benchmarkng stage. y Observed wall-clock tme n the th n Number of cores allocated n the th A. Performance Model Choosng an approprate performance model s one of the most mportant steps n desgnng a successful SLB algorthm. Over the years many performance models have been developed [53]. Many of parallel performance models begn by dentfyng sequental and parallel components of the executon tme n accordance wth Amdahl s law. They try to capture the salent features of the calculaton n terms of the key parameters of the problem. For the FMO applcaton consdered here, the key feature s the coarsegraned parallelsm, whch can be captured by selectng mathematcal models for the run tme of each fragment ndependently. In ths work, we use the nonlnear model scal nonln seral a c T ( n ) T ( n ) T ( n ) T b n d, 1,..., F, n (3) where T ) represents the wall-clock tme to compute the ( n th fragment as a functon of n the number of processor cores allocated to process t. The three components of T ( n ) are descrbed next. scal The quantty T ( n ) represents the component of the wall-clock tme wth perfect (or lnear) scalablty. It s a monotoncally decreasng functon that asymptotcally seral approaches zero. The quantty T ( n ), on the other hand, represents the tme spent n the nonparallelzed component of the applcaton. It s ndependent of the number of cores n and ncludes any purely seral part of code. From the mathematcal pont of vew t s a constant that defnes the mnmum value of T ) nonln ( T ( n. nonln ( n (gnorng T n ) to domnate ) ). As n ncreases, ( n seral T s expected The quantty T ) represents the component of the scal ( wall-clock tme that s not descrbed by ether T n ) or seral ( T n ). It represents the tme spent n code that s only partally parallelzed or depends on n n a way more complcated than the other two components. An example of a partally parallel component of our applcaton s the dagonalzaton of the Fock matrx n the self-consstent nonln feld (SCF) method. Generally, T ( n ) may nclude tme spent n actvtes such as ntalzaton, communcaton, and nonln synchronzaton. Our choce of the form of T ( n ) gves our model the ablty to account for all these components wthout constranng t to be an ncreasng or decreasng functon. The sgn of the parameters b and c determnes the shape of the functon, and consequently every fragment nonln may have a dfferent shape of T n ). ( n The functonal form of T ) seems to make sense both mathematcally and from the vewpont of Amdahl s law. From the mathematcal perspectve, one component of T ( n ) decreases, whle another ncreases wth n. The functon may ncrease or decrease for dfferent values of n dependng on the domnatng component for that number of cores. Two real examples of T ( n ) are llustrated n Fg. 2, where the probed range of the number of cores s not large enough to observe a complex behavor and T ( n ) s a smoothly decreasng functon. From the perspectve of Amdahl s law, n the absence of the complcatng nonln scal component T n ), T n ) accounts for largest ( contrbuton when contrbuton to for large n. B. Fttng Data ( ( n s small, whle seral T s the largest We estmate the parameters a, b, c, and d used n Eq. (3) by fttng the values of wall-clock tme of each fragment over the frst SCC teraton for dfferent CPU core groupngs. In other words, we perform calculatons of each fragment n the embeddng potental, varyng the number of cores per GDDI group. The tmngs are collected as a functon of the number of cores per group, and we ft the coeffcents. In the future we plan to examne the possbltes of usng several SCC teratons for the fttng. th For the fragment, we obtan the best ft by solvng the least squares problem D mn a c y,,,, j bnj d a b c d j1 n (4) j subject to a, b, c, d, where y j s the observed value of tme taken n solvng for fragment j when n j cores are allocated to t. D s the number of dfferent GDDI groups szes tred n the fttng procedure (n ths paper, D vared from 3 to 7, dependng on the system). The objectve functon of the optmzaton problem (4) s n general not convex, and there may be several locally optmal solutons of the problem. Snce nonlnear optmzaton algorthms are teratve, selectng a dfferent startng pont may lead the solver to a dfferent local soluton. We expermented wth dfferent startng solutons and observed that even though the parameter values may 2

7 dffer, the soluton value of problem (4) dd not vary sgnfcantly. More mportant was the observaton that the dfferences n parameter values dd not translate nto sgnfcant dfferences n the optmal allocaton of cores that we calculate n the next step. We have constraned the varables n our fttng problem Eq. (4) to be nonnegatve even though dong so s not necessary mathematcally. It makes sense for parameters a, b, and d to be postve because they represent values of tme. It s less obvous what the constrants for c should be. In nonln general, T ( n ) can be ncreasng or decreasng, but we prefer a postve c because our applcaton s hghly scalable. The total tme does not ncrease even when the number of cores used n producton runs s much larger than that n tral runs for gatherng data. Thus, a postve value of c ensures that our model has a better ft even when we extrapolate t to a large number of cores. The examples of ftted a, b, c, and d can be seen for the smallest and the largest fragments n Fg. 2. Snce the values y j are gathered from actual runs on the system, t s mportant to judcously choose tral values of n j n the data-gatherng stage. There s an obvous trade-off between the tme taken to obtan y and the qualty of the model. j Snce the soluton procedure n GAMESS s teratve and the nature of work s smlar for all teratons, we can model the functons usng tme observatons for a sngle teraton only. It helps us save tme wthout sacrfcng accuracy. To obtan good estmates of a b, c, and d, we recommend samplng nj, from a large range of core counts: from a few to thousands for each fragment. In order to avod over-fttng, the number of samples should be at least greater than four for each fragment. We used eght samples n our experments. The number of samples should obvously ncrease wth the level of nose n the applcaton and the number of parameters to be estmated. In general, one should judcously pck samples based on a pror knowledge of the tasks. Lackng such knowledge, we began by dvdng the avalable cores equally among all groups. Ths approach proved satsfactory for systems wth smlar-szed fragments (along wth a consstent theory and bass set). In the cases when, for example, a lgand s much larger than the largest amno acd, a more sophstcated allocaton for samplng may needed. We also note that the recorded tmes do not nclude FMO ntalzaton and ntergroup synchronzaton tme, but they do nclude all ntragroup computaton and communcaton ncludng synchronzaton. Our procedure of frst collectng data and then rerunnng the full applcaton from scratch can be mproved. We use our smple procedure to demonstrate the effectveness of usng an optmal allocaton of cores. Our procedure can be modfed wth lttle effort to reuse more nformaton from the data collecton stage for the solvng stage. C. Formulatng the Optmzaton Problem Once we have dentfed an approprate performance model and obtaned values of all parameters from the prevous steps, we can formulate an optmzaton problem to fnd the optmal allocaton of cores. The decson varables that we seek to optmze are the number of processors, n, to be allocated to each fragment { 1,... F}. The choce of objectve that we seek to mnmze or maxmze depends on the preference of the user. One can mnmze the total wallclock tme of the applcaton the followng mn-max functon can be used mn max T ( n ). (5) n 1,..., F Alternatvely, the objectve functon s just the sum of tmes used by each task, F mn T ( n ). (6) n 1 One can also seek to maxmze the mnmum tme used by a task. Lke the mn-max crteron, the max-mn crteron also seeks to obtan a far dstrbuton of cores by takng away allocatons from the fastest tasks. It s wrtten as max mnt ( n ). (7) n The physcal restrctons of the system can be modeled by addng constrants to the optmzaton problem; for example, the number of cores used n calculatons cannot exceed the total number of avalable cores, N, F n N. (8) 1 We can also have constrants based on user s preferences, e.g. the user may wsh to mnmze the wall clock tme wth an addtonal constrant that the total core tme must be below a threshold T: F T ( n ) T. (9) 1 Some constrants may be needed to make the model amenable for the solver. In partcular, most solvers requre the dervatves of objectve and constrants to be contnuous. The mn-max objectve functon should be therefore be replaced by an objectve of mnmzng a new varable, say η, and addtonal constrants must be ntroduced to ensure η s no less than each f n ). The full model s ( F mn subject ton N n, 1 a c bn d, 1,..., F, n n 0, nteger, 1,..., F. (10)

8 We consdered the three objectve functons descrbed above, together wth constrant (8) n our models. We observed n our experments that the mn-max functon (5) outperforms the other objectves, whch makes sense from the vewpont of mnmzng the overall wall-clock tme. Mnmzng total tme, at the other extreme, may lead to a soluton where one fragment s solved n exceptonally large tme (Fg. 7), thus keepng the other processors watng. FORTRAN codes and hence can be drectly called wthout requrng AMPL. For solvng our problem, we use the LP/NLP [56] solver mplemented n MINOTAUR. Snce the coeffcents a, b, c are postve, the nonlnear functons are convex, and ths algorthm fnds a global soluton of the problem. We brefly descrbe ths algorthm next. Fgure 7: Allocaton of dfferent solvers: (A) mnmzng total tme, (B) maxmzng the mnmum group tme, and (C) mnmzng the maxmum group tme. The heght of each column represents tme to compute one fragment, and the wdth of each column represents how many cores were assgned. The dataset s for the complex of Aurora-A knase and ts nhbtor, whch was collected on 1024 cores of Blue Gene/P for FMO at the RHF/6-31G* level. D. Solvng the MINLP Model MINLP problems, of whch the optmzaton problem Eq. (10) s a specal case, are NP-hard n general. Certan specfc classes of MINLP, such as the sngle constrant resource constraned problems wth nonncreasng objectve functons can be solved n polynomal tme [54]. But they requre customzed algorthms. Hence we consder algorthms for general MINLPs only. The algorthms to solve general MINLPs are usually based on the branch-andbound method [55]. These methods are guaranteed to provde an optmal soluton or show that none exsts. In addton to the number of varables and constrants, the tme requred to solve these problems depends on the type of functons used n the objectve and constrants. For nstance, f all the nonlnear functons are convex, then a local soluton of the contnuous relaxaton s also ts global soluton. Several specalzed algorthms explot ths fact and other useful propertes of convex functons [55-60]. On the other hand, f any functon s not convex, then the contnuous relaxaton does not gve a bound on the objectve value. In ths case, one needs to further relax the contnuous problem by ntroducng new varables and modfyng the constrants [61, 62]. We wrote our optmzaton problem n the AMPL [63] modelng language. AMPL enables users to wrte optmzaton model usng smple mathematcal notaton. It also provdes dervatves of nonlnear functons automatcally, and t can be used wth several dfferent solvers. To solve the problem, we used the open-source solver toolkt MINOTAUR [28]. MINOTAUR offers dfferent solvers based on the algorthms mentoned above and also offers advanced routnes to reformulate MINLPs. It provdes lbrares that can be called from other C++ and The LP/NLP algorthm s ntalzed by frst creatng a lnear relaxaton of the MINLP. Suppose we have a nonlnear constrant of the form f ( x) 0, where f s a contnuously dfferentable convex functon. A lnear relaxaton of the constrant s obtaned by the lnearzaton around any k pont x, k T k k x x f ( x ) 0. f ( x ) (11) In general, the more the number of lnearzaton constrants obtaned from dstnct ponts, the closer s the relaxaton to the orgnal problem. However, a large number of constrants can slow the solver. In order to mtgate ths problem, lnearzaton-constrants derved from only a sngle pont are added ntally. Ths pont s the obtaned by solvng the contnuous nonlnear programmng (NLP) problem. We later add lnearzaton constrants for only those nonlnear constrants that are volated sgnfcantly by the soluton. After the ntal lnear programmng (LP) relaxaton s created, t s added to a lst of unsolved sub-problems. The value of the ncumbent soluton of MINLP s ntalzed to nfnty. In each step of the algorthm, we remove a subproblem from the lst and solve the lnear relaxaton usng an LP solver. If the soluton value s greater than the ncumbent, we dscard ths sub-problem because t does not contan any soluton better than the ncumbent. If the soluton (x ˆ) of the LP problem has fractonal values, we create two new subproblems by branchng. We choose an nteger varable for whch xˆ s fractonal. In one sub-problem we add the constrant x xˆ. In the other, we add the x ˆ. These two sub-problems are added to constrant x

9 the lst of unsolved sub-problems. If xˆ satsfes nteger constrants, we check whether t satsfes all the nonlnear constrants as well. If t s feasble, then we have an ncumbent soluton. Otherwse, we add more lnearzaton constrants around xˆ of the form shown n Eq. (10), and contnue. The algorthm termnates when the lst s empty. In MINOTAUR, the LP problems are solved by usng the CLP solver [64], and the NLP problems use fltersqp [65]. In the worst case, the algorthm may requre solvng an exponental (n the number of nteger varables) number of LP and NLP problems, but n practce t takes much fewer. For example, the MINLP for 4096 cores took < 180 seconds on one core to solve, and made calls to the LP solver and 2 calls to the NLP solver. For cores, these numbers were 165 seconds, 9883 and 2, respectvely. E. Summary of HSLB Algorthm Before presentng the results of our experments, we summarze the four-step HSLB algorthm and dscuss some ways of further mprovng t. (1) Gather Data: Perform a sngle SCC teraton for the gven molecular system (proten-lgand complex) wth FMO by executng GAMESS D tmes usng a dfferent total numbers of cores, wth sutable choces for D. Collect the runnng tmes y j for each fragment. (2) Ft: Next, solve F dfferent least squares problems (4) to determne the coeffcents a, b, c, and d n Eq. (3) for each fragment. (3) Solve: Determne the best allocaton by solvng the MINLP (10), and obtan the optmal values of sze n for each fragment. (4) Execute: Execute the complete FMO run wth GAMESS, usng the determned group szes n step (3). Ths algorthm, beng of a general nature, can be mproved n several ways for a gven applcaton. The data gatherng step (1) can be avoded altogether f relable benchmarks are already avalable, for example, from prevous experments. Steps (2) and (3) can be solved by callng a MINLP solver drectly from the applcaton, thus avodng the use of AMPL. The least-squares problem can be solved wth a MINLP solver by just callng ts nonlnear solver once. After t s solved, the MINLP solver can then solve the MINLP of step (3). More mprovements are possble f the HSLB procedure s called more than once to reallocate the cores after a few teratons of the complete run. The runnng tme of all teratons can be stored, a better ft be obtaned, and the MINLP re-solved to obtaned better allocaton based on the new data. In ths work, we appled HSLB only to the monomer step n FMO, whch s teratve and requres runnng each monomer calculaton typcally tmes. We used DLB for the dmer step, whch nvolves computng each dmer once; and thus the benefts of an optmzed allocaton n HSLB do not mert ts applcaton gven the need to do prelmnary data gatherng. However, n the future t s concevable to construct a good guess for an optmum node allocaton n dmers based on the monomer data, whch would accelerate the dmer step as well. The load balancng n dmers s also less severe than n monomers, because the number of dmers, for whch quantum-mechancal calculatons are performed, s typcally 3-4 tmes the number of fragments F (dmers that are spatally well separated are computed wth a very fast electrostatc approxmaton) [44]. V. RESULTS AND DISCUSSION The performance of HSLB s compared to that of DLB wth dfferent numbers of groups for the system of Aurora knase and nhbtor phthalaznone (see Fg. 1 (A) and (B)). Ths system has 155 fragments. Fg. 4 shows that the HSLB scheme outperforms the DLB schemes by at least a factor of two n the wall-clock tme. We also found that some DLB schemes have scalablty smlar to HSLB. We also make other observatons about the performance of HSLB. Fg. 5 shows that HSLB has the lowest synchronzaton tme even on thousands of processors. Snce the synchronzaton tme becomes mportant when a large number of CPU cores are used, HSLB should be preferred for such systems. The HSLB algorthm also shows excellent effcency, greater than 90% on large numbers of cores, as shown n Fg. 6. As the number of cores ncreases, we antcpated that the scalablty and effcency of HSLB mght deterorate. To quantfy ths deteroraton, we tested the performance of HSLB for larger processor counts usng a larger problem: COX-1 complexed wth buprofen (see Fg. 1 (C) and (D)); a total of 1093 fragments and 17,767 atoms. Fg. 8 shows that the COX-1 calculaton acheves 80% effcency averaged over all fragments for the SCC teratons n FMO on 163,840 cores at the RHF, 6-31G* level of theory. The sngle-pont Fgure 8: Ideal and observed scalablty curves based on wall-clock tme for the frst FMO SCC teraton on Blue Gene/P for COX-1 complexed wth buprofen. All calculatons are done n a dual mode that restrcts processes to 2 MPI tasks per node. Effcency averaged over all the fragments s shown for each run. energy calculaton takes only ~54 mnutes (6+ years on a sngle core). The results obtaned at ths computatonal level strongly suggest that sgnfcantly hgher processor counts can be effcently utlzed for larger problems.

10 Whle HSLB outperforms DLB, t stll exhbts a small declne n scalablty and effcency for hgh processor counts for both the Aurora knase and COX-1 calculatons. Ths declne may be due to sequental steps n the fragment SCF and the fluctuatons n the synchronzaton tme caused by runtme operatng system tasks, shared network ssues, hardware falure, defcences of performance model or benchmarkng data, and so forth. It s commonly understood that for these reasons, synchronzaton becomes more problematc as the number of processors ncreases. Although these fluctuatons do not appear n Fg. 5 (only averaged values are shown), use of a low level of theory (RHF, 6-31G*) here has helped uncover the lmtatons of the HSLB approach by rasng the sgnfcance of the synchronzaton tme (for densty functonal theory (DFT) one can expect a better parallel effcency because of a hgh scalng of the DFT specfc grd ntegraton). From the data, the operatng lmts of HSLB on Blue Gene/P would appear to be anywhere from three cores per task up to the pont where random computatonal nose (>100 thousands cores) hampers the ablty to predct the tme to soluton for tasks. We have dentfed drectons for further mprovng our load-balancng approach. We observed that the teraton tme n our applcaton s not a constant but tends to decrease because successve SCC teratons typcally requre fewer mcro-teratons to converge the densty. Moreover, ths behavor s not unform over dfferent fragments because they converge at dfferent rates. We propose to apply HSLB adaptvely. We can ft scalablty curves, obtan the nonlnear equatons and solve for the optmal allocaton for all SCC teratons, as descrbed n Secton IV. To ths end, we have nterfaced MINOTAUR drectly wth GAMESS on Blue Gene/P. It enables us to drectly optmze wthout makng system calls to execute the AMPL model. We have not ncluded results for adaptve HSLB here because our goal s to present the fundamental HSLB concept. That sad, adaptve HSLB offers a promsng drecton for future development because t combnes the effcency of HSLB wth the adaptablty of DLB. VI. CONCLUSIONS The method development n ths paper s an evoluton of the parallelzaton of a complex quantum-mechancal program GAMESS [24, 25] over dozens of CPU cores n DDI ntroduced n 2000 [46], extended to hundreds wth GDDI n 2004 [48] as demonstrated on a powerful supercomputer of that tme (n 2005 [49]). The manual varaton of the group sze n GDDI to optmze ts performance used n a Supercomputng-2005 paper [49] nspred the present work, whch we have conducted based on advanced mathematcal methods guaranteeng the best allocaton for a gven number of cores and a molecular system. Although for fne-graned systems (water clusters) the prevously developed load balancng has performed well up to about 130,000 CPU cores [27], coarse-graned systems (protens) cannot be treated wth hgh effcency on modern petascale computers n the same way. We have shown that the present HSLB approach s twce as fast as the prevous DLB method and acheves a parallel effcency of about 80% on petascale core-counts (hundreds of thousands of cores). Thus, from the user-perspectve, HSLB s enablng FMO to handle automatcally very large problems wth dverse fragment szes. Many nterestng cases fall nto the latter category. For example, n the study of photosynthess, the reacton center [49, 66] features the chlorophyll specal par, whch s large and dffcult to fragment for the chemcal reasons (sgnfcant electron delocalzaton across the planar system). Another common stuaton s found n drug desgn, where the drug molecules often have atoms wth extended conjugaton. Such large fragments typcally coexst wth many small ones, such as explct water molecules havng only three atoms per fragment. Where DLB based on unform group szes would be unable to utlze many cores effectvely for such systems, by fttng the GDDI group szes to the fragments HSLB can effcently utlze CPU core counts n the 100,000-range wth neglgble overhead. In ths sense, HSLB s smlar n sprt to the use of prelmnary benchmarks n prevous work to guess the optmum group szes [49]. Our current era of petascale computng already has an eye on the comng exascale era, and the development of software capable of effcently utlzng many thousands or mllons of CPU cores s a topc of great nterest. FMO accelerated by HSLB on petascale and exascale computers can become a powerful tool for drug and materal desgn [44], realzng the hgh potental held by quantummechancal methods on massvely parallel computers. The present coarse-graned optmzaton algorthm s not lmted to FMO. Many coarse-graned applcatons can beneft from the present approach. For nstance, many other fragment-based methods can be smlarly parallelzed. As the number of cores ncreases, the ssues of mnmzng the synchronzaton tme whle retanng a hgh effcency wll put load balancng schemes to a hghly stressful test. We beleve that for coarse-graned applcatons our HSLB algorthm s a promsng and general approach. ACKNOWLEDGMENT We thank Dr. R. Loy and ALCF team members for dscussons and help related to the paper. We thank Dr. M. Mazanetz from Evotec for provdng the PDB structure of Aurora-A knase system used n our calculatons. DGF thanks the Next Generaton Super Computng Project, Nanoscence Program (MEXT, Japan) and Computatonal Materals Scence Intatve (CMSI, Japan) for fnancal support and Prof. K. Ktaura for frutful dscussons. The submtted manuscrpt has been created by the UChcago Argonne, LLC, Operator of Argonne Natonal Laboratory ( Argonne ) under Contract No. DE-AC02-06CH11357 wth the U.S. Department of Energy. The U.S. Government retans for tself, and others actng on ts behalf, a pad-up, nonexclusve, rrevocable worldwde lcense n sad artcle to reproduce, prepare dervatve works, dstrbute copes to the publc, and perform publcly and dsplay publcly, by or on behalf of the Government. Ths work was also supported by the U.S. Department of Energy through grant DE-FG02-05ER25694.

11 REFERENCES [1]C. Xu and F. C. M. Lau, Load balancng n parallel computers: theory and practce. Norwell, MA. Kluwer Academc Publshers, [2]K. D. Devne, E. G. Boman, R. T. Heaphy, B. A. Hendrckson, J. D. Teresco, J. Fak, J. E. Flaherty, and L. G. Gervaso, "New challenges n dynamc load balancng," Appled Numercal Mathematcs, vol. 52, pp , [3]M. H. Wllebeek-LeMar and A. P. Reeves, "Strateges for dynamc load balancng on hghly parallel computers," Parallel and Dstrbuted Systems, IEEE Transactons on, vol. 4, pp , [4]Y. Bejerano, S. J. Han, and L. E. L, "Farness and load balancng n wreless LANs usng assocaton control," n Proceedngs of the 10th annual nternatonal conference on moble computng and networkng, New York, NY, 2004, pp [5]B. Y. Zhang, Z. Y. Mo, G. W. Yang, and W. M. Zheng, "An effcent dynamc load-balancng algorthm n a large-scale cluster," Dstrbuted and Parallel Computng, pp , [6]M. J. Zak, W. L, and S. Parthasarathy, "Customzed dynamc load balancng for a network of workstatons," n Proceedngs of 5th IEEE Internatonal Symposum on Hgh Performance Dstrbuted Computng, Syracuse, NY, 1996, pp [7]R. D. Wllams, "Performance of dynamc load balancng algorthms for unstructured mesh calculatons," Concurrency: Practce and experence, vol. 3, pp , [8]M. J. Berger and S. H. Bokhar, "A parttonng strategy for nonunform problems on multprocessors," Computers, IEEE Transactons on, vol. 100, pp , [9]H. D. Smon, "Parttonng of unstructured problems for parallel processng," Computng Systems n Engneerng, vol. 2, pp , [10]V. E. Taylor and B. Nour-Omd, "A study of the factorzaton fll-n for a parallel mplementaton of the fnte element method," Internatonal journal for numercal methods n engneerng, vol. 37, pp , [11]M. S. Warren and J. K. Salmon, "A parallel hashed octtree n-body algorthm," n Proceedngs of the ACM/IEEE Supercomputng 1993 Conference, Portland, 1993, pp [12]J. R. Plkngton and S. B. Baden, "Parttonng wth spacefllng curves, CSE Techncal Report CS94-349," Dept. of Computer Scence Engneerng, Unversty of Calforna, San Dego, CA1994. [13]A. Patra and J. T. Oden, "Problem decomposton for adaptve hp fnte element methods," Computng Systems n Engneerng, vol. 6, pp , [14]J. E. Flaherty, R. M. Loy, M. S. Shephard, B. K. Szymansk, J. D. Teresco, and L. H. Zantz, "Adaptve local refnement wth octree load balancng for the parallel soluton of three-dmensonal conservaton laws," Journal of Parallel and Dstrbuted Computng, vol. 47, pp , [15]A. Pothen, H. D. Smon, and K. P. Lou, "Parttonng sparse matrces wth egenvectors of graphs," SIAM Journal on Matrx Analyss and Applcatons vol. 11, pp , [16]E. Less and H. Reddy, "Dstrbuted load balancng: desgn and performance analyss," WM Keck Research Computaton Laboratory, vol. 5, pp , [17]G. Karyps and V. Kumar, "A fast and hgh qualty multlevel scheme for parttonng rregular graphs," SIAM Journal on Scentfc Computng, vol. 20, p. 359, [18]Y. F. Hu and R. J. Blake, "An optmal dynamc load balancng algorthm, Techncal Report DL-P ," Daresbury Laboratory, Warrngton, WA4 4AD, UK1995. [19]B. Hendrckson and R. Leland, "A multlevel algorthm for parttonng graphs," n Proceedngs of the ACM Supercomputng 1995 Conference, New York, 1995, pp [20]G. Cybenko, "Dynamc load balancng for dstrbuted memory multprocessors," Journal of Parallel and Dstrbuted Computng, vol. 7, pp , [21]T. Bu and C. Jones, "A heurstc for reducng fll n sparse matrx factorzaton," n SIAM Conference on Parallel Processng for Scentfc Computng, Phladelpha, PA, 1993, pp [22]S. H. Bokhar, "On the mappng problem," IEEE Transactons on Computers, vol. 100, pp , [23]S. H. Bokhar, Assgnment problems n parallel and dstrbuted computng vol. 32. New York, NY. Sprnger- Verlag, [24]M. W. Schmdt, K. K. Baldrdge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Kosek, N. Matsunaga, K. A. Nguyen, S. S., T. L. Wndus, M. Dupus, and J. A. J. Montgomery, "General atomc and molecular electronc structure system," Journal of Computatonal Chemstry, vol. 14, pp , [25]M. S. Gordon and M. W. Schmdt, "Advances n electronc structure theory: GAMESS a decade later," n Theory and Applcatons of Computatonal Chemstry: The Frst Forty Years, C. Dykstra, G. Frenkng, K. Km, and G. Scusera, Eds., ed. Elsever Scence, 2005, pp [26]Argonne Natonal Laboratory: Argonne Leadershp Computng Faclty. Avalable: [27]G. D. Fletcher, D. G. Fedorov, S. R. Prutt, T. L. Wndus, and M. S. Gordon, "Large-scale MP2 calculatons on the Blue Gene archtecture usng the Fragment Molecular Orbtal method," Journal of Chemcal Theory and Computaton, vol. 8, pp , [28]A. Mahajan, S. Leyffer, J. Lnderoth, J. Luedtke, and T. Munson. MINOTAUR wk. Avalable: (January 16, 2012)

12 [29]R. Zalesny, M. G. Papadopoulos, P. G. Mezey, and J. Leszczynsk, Lnear-Scalng Technques n Computatonal Chemstry and Physcs. New York, NY. Sprnger, [30]J. R. Remers, Computatonal Methods for Large Systems: Electronc Structure Approaches for Botechnology and Nanotechnology. Sngapore. Wley, [31]E. Apra, R. J. Harrson, W. Shelton, V. Tpparaju, and A. Vázquez-Mayagota, "Computatonal chemstry at the petascale: Are we there yet?," n Journal of Physcs: Conference Seres, 2009, p [32]Y. Hasegawa, J. I. Iwata, M. Tsuj, D. Takahash, A. Oshyama, K. Mnam, T. Boku, F. Shoj, A. Uno, and M. Kurokawa, "Frst-prncples calculatons of electron states of a slcon nanowre wth 100,000 atoms on the K computer," n Proceedngs of the ACM/IEEE Supercomputng 2005 Conference, Seattle, 2011, pp [33]E. Apra, A. P. Rendell, R. J. Harrson, V. Tpparaju, W. A. dejong, and S. S. Xantheas, "Lqud water: obtanng the rght answer for the rght reasons," n Proceedngs of the ACM/IEEE Supercomputng 2009 Conference, Portland, 2009, p. 66. [34]K. Kowalsk, S. Krshnamoorthy, R. M. Olson, V. Tpparaju, and E. Aprà, "Scalable mplementatons of accurate excted-state coupled cluster theores: Applcaton of hgh-level methods to porphyrn-based systems," n Proceedngs of the ACM/IEEE Supercomputng 2011 Conference, Seattle, 2011, pp [35]Y. Alexeev, R. A. Kendall, and M. S. Gordon, "The dstrbuted data SCF," Computer Physcs Communcatons, vol. 143, pp , [36]Y. Alexeev, M. W. Schmdt, T. L. Wndus, and M. S. Gordon, "A parallel dstrbuted data CPHF algorthm for analytc Hessans," Journal of Computatonal Chemstry, vol. 28, pp , [37]M. Krshnan, Y. Alexeev, T. L. Wndus, and J. Neplocha, "Multlevel parallelsm n computatonal chemstry usng Common Component Archtecture and Global Arrays," n Proceedngs of the ACM/IEEE Supercomputng 2005 Conference, Seattle, 2005, pp [38]G. Fletcher, "A parallel mult-confguraton selfconsstent feld algorthm," Molecular Physcs, vol. 105, pp , [39]M. Challacombe and E. Schwegler, "Lnear scalng computaton of the Fock matrx," Journal of Chemcal Physcs, vol. 106, pp , [40]R. J. Harrson, G. I. Fann, T. Yana, Z. Gan, and G. Beylkn, "Multresoluton quantum chemstry: Basc theory and ntal applcatons," Journal of Chemcal Physcs, vol. 121, p , [41]M. S. Gordon, S. R. Prutt, D. G. Fedorov, and L. V. Slpchenko, "Fragmentaton methods: a route to accurate calculatons on large systems," Chemcal Revews, vol. 112, pp , [42]S. Hrata, M. Valev, M. Dupus, S. S. Xantheas, S. Sugk, and H. Sekno, "Fast electron correlaton methods for molecular clusters n the ground and excted states," Molecular Physcs, vol. 103, pp , [43]K. Ktaura, E. Ikeo, T. Asada, T. Nakano, and M. Uebayas, "Fragment molecular orbtal method: an approxmate computatonal method for large molecules," Chemcal Physcs Letters, vol. 313, pp , [44]D. G. Fedorov, T. Nagata, and K. Ktaura, "Explorng chemstry wth the Fragment Molecular Orbtal method," Physcal Chemstry Chemcal Physcs, vol. 14, pp , [45]D. G. Fedorov and K. Ktaura, "The mportance of threebody terms n the fragment molecular orbtal method," Journal of Chemcal Physcs, vol. 120, pp , [46]G. D. Fletcher, M. W. Schmdt, B. M. Bode, and M. S. Gordon, "The dstrbuted data nterface n GAMESS," Computer Physcs Communcatons, vol. 128, pp , [47]J. L. Bentz, R. M. Olson, M. S. Gordon, M. W. Schmdt, and R. A. Kendall, "Coupled cluster algorthms for networks of shared memory parallel processors," Computer Physcs Communcatons, vol. 176, pp , [48]D. G. Fedorov, R. M. Olson, K. Ktaura, M. S. Gordon, and S. Kosek, "A new herarchcal parallelzaton scheme: Generalzed dstrbuted data nterface (GDDI), and an applcaton to the fragment molecular orbtal method (FMO)," Journal of Computatonal Chemstry, vol. 25, pp , [49]T. Ikegam, T. Ishda, D. G. Fedorov, K. Ktaura, Y. Inadom, H. Umeda, M. Yokokawa, and S. Sekguch, "Full electron calculaton beyond 20,000 atoms: Ground electronc state of photosynthetc protens," n Proceedngs of the ACM/IEEE Supercomputng 2005 Conference, Seattle, pp [50]Y. Alexeev. FMO portal: Web nterface for FMOtools. Avalable: (January 16, 2012) [51]D. G. Fedorov, Y. Alexeev, and K. Ktaura, "Geometry optmzaton of the actve ste of a large system wth the fragment molecular orbtal method," Journal of Physcal Chemstry Letters, vol. 2, pp , [52]D. G. Fedorov, T. Ishda, and K. Ktaura, "Multlayer formulaton of the fragment molecular orbtal method (FMO)," The Journal of Physcal Chemstry A, vol. 109, pp , [53]C. L. Janssen and I. M. B. Nelsen, Parallel computng n quantum chemstry. CRC Press, [54]T. Ibarak and N. Katoh, Resource allocaton problems: algorthmc approaches. Cambrdge, MA. The MIT Press, 1988.

13 [55]R. J. Dakn, "A tree-search algorthm for mxed nteger programmng problems," The Computer Journal, vol. 8, pp , [56]I. Quesada and I. E. Grossmann, "An LP/NLP based branch and bound algorthm for convex MINLP optmzaton problems," Computers & Chemcal Engneerng, vol. 16, pp , [57]M. A. Duran and I. E. Grossmann, "An outerapproxmaton algorthm for a class of mxed-nteger nonlnear programs," Mathematcal Programmng, vol. 36, pp , [58]R. Fletcher and S. Leyffer, "Solvng mxed nteger nonlnear programs by outer approxmaton," Mathematcal Programmng, vol. 66, pp , [59]T. Westerlund and F. Pettersson, "An extended cuttng plane method for solvng convex MINLP problems," Computers & Chemcal Engneerng, vol. 19, pp , [60]A. Mahajan, S. Leyffer, and C. Krches, "Solvng mxednteger nonlnear programs by QP-dvng," Argonne Natonal Laboratory ANL/MCS-P , 2012 [61]R. Horst and T. Hoang, Global Optmzaton: Determnstc Approaches. Berln. Sprnger-Verlag, [62]M. Tawarmalan and N. V. Sahnds, Convexfcaton and Global Optmzaton n Contnuous and Mxed- Integer Nonlnear Programmng: Theory, Algorthms, Software, and Applcatons vol. 65. Dordrecht. Kluwer Academc Publshers, [63]R. Fourer, D. M. Gay, and B. Kernghan, AMPL: A Modelng Language for Mathematcal Programmng, 2nd Edton Independence, KY. Cengage Learnng, [64]J. J. Forrest. Clp project. Avalable: (January 16, 2012) [65]R. Fletcher and S. Leyffer, "Nonlnear programmng wthout a penalty functon," Mathematcal Programmng, vol. 91, pp , [66]T. Ikegam, T. Ishda, D. G. Fedorov, K. Ktaura, Y. Inadom, H. Umeda, M. Yokokawa, and S. Sekguch, "Fragment molecular orbtal study of the electronc exctatons n the photosynthetc reacton center of Blastochlors vrds," Journal of Computatonal Chemstry, vol. 31, pp , 2010.

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna [email protected]

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

An MILP model for planning of batch plants operating in a campaign-mode

An MILP model for planning of batch plants operating in a campaign-mode An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN [email protected] Gabrela Corsano Insttuto de Desarrollo y Dseño

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle [email protected] 2 Unversdad Fns Terrae,

More information

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel Load-Balancng Method for Soluton-Adaptve Unstructured Fnte Element

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany [email protected],

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures An ILP Formulaton for Task Mappng and Schedulng on Mult-core Archtectures Yng Y, We Han, Xn Zhao, Ahmet T. Erdogan and Tughrul Arslan Unversty of Ednburgh, The Kng's Buldngs, Mayfeld Road, Ednburgh, EH9

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

A frequency decomposition time domain model of broadband frequency-dependent absorption: Model II

A frequency decomposition time domain model of broadband frequency-dependent absorption: Model II A frequenc decomposton tme doman model of broadband frequenc-dependent absorpton: Model II W. Chen Smula Research Laborator, P. O. Box. 134, 135 Lsaker, Norwa (1 Aprl ) (Proect collaborators: A. Bounam,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt

Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt Examensarbete Rotatng Workforce Schedulng Carolne Granfeldt LTH - MAT - EX - - 2015 / 08 - - SE Rotatng Workforce Schedulng Optmerngslära, Lnköpngs Unverstet Carolne Granfeldt LTH - MAT - EX - - 2015

More information

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center Dynamc Resource Allocaton and Power Management n Vrtualzed Data Centers Rahul Urgaonkar, Ulas C. Kozat, Ken Igarash, Mchael J. Neely [email protected], {kozat, garash}@docomolabs-usa.com, [email protected]

More information

HÜCKEL MOLECULAR ORBITAL THEORY

HÜCKEL MOLECULAR ORBITAL THEORY 1 HÜCKEL MOLECULAR ORBITAL THEORY In general, the vast maorty polyatomc molecules can be thought of as consstng of a collecton of two electron bonds between pars of atoms. So the qualtatve pcture of σ

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet 2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B-1348 Louvan-la-Neuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 E-mal: [email protected]

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea [email protected] 45 The con-tap test has the

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Adaptive Fractal Image Coding in the Frequency Domain

Adaptive Fractal Image Coding in the Frequency Domain PROCEEDINGS OF INTERNATIONAL WORKSHOP ON IMAGE PROCESSING: THEORY, METHODOLOGY, SYSTEMS AND APPLICATIONS 2-22 JUNE,1994 BUDAPEST,HUNGARY Adaptve Fractal Image Codng n the Frequency Doman K AI UWE BARTHEL

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Formulating & Solving Integer Problems Chapter 11 289

Formulating & Solving Integer Problems Chapter 11 289 Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization Hndaw Publshng Corporaton Mathematcal Problems n Engneerng Artcle ID 867836 pages http://dxdoorg/055/204/867836 Research Artcle Enhanced Two-Step Method va Relaxed Order of α-satsfactory Degrees for Fuzzy

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers J. Parallel Dstrb. Comput. 71 (2011) 732 749 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. ournal homepage: www.elsever.com/locate/pdc Envronment-conscous schedulng of HPC applcatons

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture A Desgn Method of Hgh-avalablty and Low-optcal-loss Optcal Aggregaton Network Archtecture Takehro Sato, Kuntaka Ashzawa, Kazumasa Tokuhash, Dasuke Ish, Satoru Okamoto and Naoak Yamanaka Dept. of Informaton

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Lyapunov Optimization Approach to Repeated Stochastic Games PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://www-bcf.usc.edu/

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:[email protected]

More information